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Preface 


This manual is part of the Storage Subsystem Library (SSL) — a set of manuals that 
provides information about the hardware components of IBM disk storage 
subsystems. Although the SSL is concerned primarily with 3380 and 3390 direct 
access storage devices (DASD), information is provided for the following DASD 
types as well: 3330, 3333, 3340, 3344, 3350, 3370, 3375, 9332, and 9335. 


This manual is written for the storage administrator, system programmer, or 
hardware performance specialist who is responsible for providing and maintaining 
required levels of storage subsystem availability and performance. 


The information in this book helps you understand the types of errors that require 

your attention. It also tells you what tools are available to help you evaluate error 
situations and select the appropriate corrective action. By using the disk storage 

media maintenance facilities described here, you can: 


e Ensure data accessibility for your user community and applications 
e Reduce the time spent in problem determination 

e Anticipate and fix problems before serious errors occur 

e Reduce the number of calls for outside service 

e Reduce performance degradation caused by disk storage errors. 


The information applies generally to the MVS, VM, and VSE operating environments. 


About This Book 


You can use this book either as a guide for handling specific error conditions, or as 
a learning tool for understanding and maintaining the disk storage subsystems at 
your processing complex. 


| This manual contains: 


| “Chapter 1. Introduction” on page 15, describes the characteristics of errors 
| and the handling of checks. 


| “Chapter 2. The Error Handling Process” on page 21, explains the respective 

| roles of the operating system and the storage subsystem in detecting errors and 
| recovering from problems. This chapter recommends routine practices for 

| maintaining disk storage media effectively. In addition, this chapter introduces 

| concurrent media maintenance. 


“Chapter 3. Performing Media Maintenance on SIM DASD” on page 27, 
describes the use of service information messages (SIMs) and the Service 
Information Messages report. This chapter also provides step-by-step 
procedures for performing media maintenance for devices that produce SIMs 
and describes how you can automate the media maintenance process. 


“Chapter 4. Performing Media Maintenance on Non-SIM DASD” on page 43, 
describes how to handle errors for devices that do not produce SIMs. This 
chapter emphasizes the use of the System Exception Reports produced by the 
Environmental Record Editing and Printing (EREP) program for disk storage 
activity. 
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“Chapter 5. Device Support Facilities” on page 57, describes the specific 
Device Support Facilities commands and functions that you can use in 
correcting disk media errors. This chapter provides guidelines for using 
concurrent media maintenance, and also describes how to use Device Support 
Facilities to initiate the procedure. . 


Appendix A, “Background Information” on page 71, contains background 
material on several concepts related to direct access storage devices and the 
storage subsystem, and is included to give you an understanding of how direct 
access storage devices work. 


Appendix B, “Specific Guidelines by DASD Type” on page 81, provides the 
explicit details for using the generally applicable instructions in Chapters 2, 3, 4, 
and 5. 


“Glossary” on page 117, lists the terms used in the Storage Subsystem Library 
manuals. | 


“Bibliography” on page 121, lists the related publications that you may want to 
refer to while performing the tasks in this manual. 


Tips on Using this Book 


If you are interested in learning about error handling and maintenance processes, 
you should read all the chapters in this manual. If you are not familiar with EREP 
and Device Support Facilities, you should consider reading Chapters 2 and 5 before 
you read Chapters 3 and 4. 


If you are not acquainted with disk storage technology, you should start by reading 
Appendix A, “Background Information” on page 71 to gain a basic understanding of 
disk storage hardware and its operation. 


In most cases, the body of this book (Chapters 1 through 5) and Appendix A, 
“Background Information” apply to all of the direct access storage devices listed at 
the beginning of this preface. Information concerning specific DASD types is located 
in Appendix B, “Specific Guidelines by DASD Type” on page 81. Some of the 
maintenance and error handling features unique to the newer 3380 and 3390 models 
are presented in the main chapters; the applicability of such material is clearly 
identified. 


_If you want to look up information related to a specific error condition on a particular 
type of DASD, you will find the index, the contents, and the partial contents at the 
beginning of Appendix B, “Specific Guidelines by DASD Type” on page 81 helpful 

_ in locating information quickly. 


The comparison table of EREP reports (Figure 19 on page 46) can help you to 
quickly identify the report you need for a specific situation for devices that do not 
produce SIMs. For devices that produce SIMs, the Service Information Messages 
Report is the only report necessary for identifying error situations. 
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Terminology 


A glossary is provided at the back of this book; however, the following terms have a 
precise meaning in this book: 


controller refers to the hardware component of a DASD head-of-string unit 
which provides the path control and data transfer functions (sometimes referred 
to as a device adapter for the 3390). 


DASD SIM refers to a SIM that informs you of a hardware error that has 
occurred on the 3390 requiring the attention of your IBM service representative. 
A DASD SIM is also referred to as a SERVICE ALERT on the EREP System 
Exception reports. The term DASD SIM will be used throughout this book unless 
there is a specific example of a SERVICE ALERT on a System Exception Report. 


device adapter performs the controller functions in a 3390 A-unit, and is referred 
to as controller throughout this book. 


ERDS refers to the area in which error records are logged for any operating 
environment. For example, ERDS information is stored in SYS1.LOGREC by 
MVS, in SYSREC by VSE, and in the error recording area by VM. 


media SIM refers to a SIM informing you that an error that requires your 
attention has occurred on the 3390 disk media. 


non-SIM DASD refers to a direct access storage device that does not produce 
SIMs, for example, 3380 and prior DASD. 


operating system refers to the MVS, VM, or VSE operating environment. The 
term is sometimes shortened to “system” in this book. 


SIM refers to a service information message generated by the 3990-3390 
subsystem to indicate that an abnormal condition exists and service action is 
required. This book discusses SIMs that relate to the 3390 only. Fora 
discussion of SIMs that relate to the 3990, see the /BM 3990 Storage Control 
Planning, Installation, and Storage Administration Guide. 


SIM Alert refers to the message that is received at the operator console when a 
SIM is generated. The term SIM Alert refers to both MEDIA ALERT and DASD 
ALERT. 


SIM DASD refers to a direct access storage device that produces SIMs, 
specifically a 3390. 


storage subsystem refers to a storage control and it’s attached storage devices. 
The term is sometimes shortened to subsystem in this book. 


The Storage Subsystem Library 


The SSL provides information for using DASD and storage control units. This library 
describes features, characteristics, capabilities and configuration options for the 
storage devices. In addition, the library offers detailed instructions for planning 
configurations, installing devices, manipulating data and managing the use of 
devices effectively in various IBM operating environments (MVS, VM and VSE). 
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The SSL also specifies software requirements for storage devices and offers 
direction on using required system products as well as optional storage 
management tools, such as DFDSS in the MVS environment. The SSL complements 
related software libraries and the MVS Storage Management Library by providing 
device-specific information on managing storage resources. 


Figure 1 on page 5 shows the relationship among the SSL books (and subsets) in 
terms of high-level tasks described in each book. Following that, under “Library 
Overview” on page 6, the books of each subset are summarized providing an 
overview of their contents. Each book contains a glossary of terms, a bibliography, 
and an index of its contents. 
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IBM 3990 Storage Control 
Reference, GA32-0099 


Maintaining IBM Storage IBM 3990 Operations and 
Subsystem Media, Recovery Reference, 
GC26-4495 GA32-0133 


IBM 3380 Direct Access 
Storage Reference Summary 
(card), GX26-1678 


Figure 1. The Storage Subsystem Library Publications 


GC26-4495 


IBM 3390 Direct Access 
Storage Introduction, 
GC26-4573 


Using IBM 3390 Direct Access 
Storage in an MVS Environment, 
GC26-4574 


Using !|BM 3390 Direct Access 
Storage in a VM Environment, 
GC26-4575 


Using IBM 3390 Direct Access 
Storage in a VSE Environment, 
GC26-4576 


IBM 3390 Direct Access 
Storage Reference Summary 
(booklet), GX26-4577 


Preface 5 


Library Overview 


3390 Direct Access Storage Publications 
The 3390 subset of the SSL includes: 


e /BM 3390 Direct Access Storage Introduction, GC26-4573 


Provides a complete description of each 3390 model, including characteristics, 
features, and capabilities. In addition, configuration and attachment options are 
described along with related information to help you design a subsystem to 
meet your needs. 


e Using IBM 3390 Direct Access Storage in an MVS Environment, GC26-4574 


Provides specific guidance for using 3390s in an MVS operating environment. 
The book provides detailed instruction for planning the addition of new 3390 
devices, installing devices, moving data to new devices, and performing 
ongoing storage subsystem management. 


e Using IBM 3390 Direct Access Storage in a VM Environment, GC26-4575 


Provides specific guidance for using 3390s in a VM operating environment. The 
book provides detailed instruction for planning the addition of new 3390s, 
installing devices, moving data to new devices, and performing ongoing storage 
subsystem management. In addition, this book discusses storage 
considerations related to guest systems. 


¢ Using IBM 3390 Direct Access Storage in a VSE Environment, GC26-4576 


Provides specific guidance for using 3390s in a VSE operating environment. The 
book provides instruction for planning the addition of new 3390s, installing 
devices, moving data to new devices, and performing ongoing storage 
subsystem management. 


e /BM 3390 Direct Access Storage Reference Summary, GX26-4577 


Provides a summary of 3390 capacity, performance, and operating 
characteristics in a compact, portable card form. 


3990 — Control Publications 
The 3990 subset of the SSL includes: 


e IBM 3990 Storage Control Introduction, GA32-0098 


Provides a complete description of the various models of the 3990 Storage 
Control, including its data availability, performance, and reliability 
improvements over previous storage controls. In addition, the book provides 
descriptions of the configuration attachment options, optional features, 
performance characteristics, and software support of the 3990 Storage Control. 


e /IBM 3990 Storage Control Planning, Installation, and pee Administration 
Guide, GA32-0100 


Provides a functional description of the 3990 Storage Control. The book 

describes the planning, program installation, and storage management tasks 

used in typical environments. Configuration examples and sample programs for 
- controlling the various functions of the 3990 Storage Control are provided. 
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¢ IBM 3990 Storage Control Reference, GA32-0099 


Provides descriptions and reference information for the 3990 Storage Control. 
The book contains channel commands, error recovery, and sense information. 


¢ Cache Device Administration, GA35-0101 


Specifies the access method services tools for administering a cache device 
under MVS. The book supports the following storage controls: 3990 Model 3, 
3880 Model 23, 3880 Model 21, 3880 Model 13, and 3880 Model 11. 


| e /BM 3990 Operations Study Guide, GA32-0131 


| A study guide for operators of 3990 storage subsystems. Provides general 
| information on system control program commands and messages, and 
| guidelines for basic problem determination. 


| e IBM 3990 Operations and Recovery Reference, GA32-0133 


| | , A user’s guide for operators of 3990 storage subsystems. Provides general 
| | guidelines for 3990 problem determination, testing of 3990 extended functions, 
| and also provides recommended recovery actions for the 3990. 


| | e Introduction to Nonsynchronous Direct Access Storage Subsystems, GC26-4519 


| Provides specific information for programmers responsible for writing DASD 

| channel programs that operate in a nonsynchronous environment. This book 

| defines synchronous and nonsynchronous operations, explains ECKD data 

| transfer commands, and provides examples of using ECKD.commands to build 
| nonsynchronous channel programs. 


3380 Direct Access Storage Publications 
The 3380 subset of the SSL includes: 


e IBM 3380 Direct Access Storage Introduction, GC26-4491 


Provides a complete description of the various models of the 3380, including 
characteristics, features, and capabilities. In addition, the configuration and 
attachment options are described along with other information that helps in 
designing a storage subsystem to meet your needs. This book does not cover 
3380 Model CJ2. : 


e IBM 3380 Direct Access Storage Direct Channel Attach Model CJ2 Introduction 
and Reference, GC26-4497 


Provides a complete description of the 3380 direct channel attach Model CJ2 
characteristics, features, capabilities, and string configuration options. 


e Using the IBM 3380 Direct Access Storage in an MVS Environment, GC26-4492 


Provides specific guidance for using the 3380 in an MVS environment. The book 
provides detailed instruction for planning the addition of new 3380 devices from 
a logical and physical point of view, installing devices, moving data to new 
devices, and performing some ongoing activities to maintain a reliable storage 
subsystem. 
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e Using the IBM 3380 Direct Access Storage in a VM Environment, GC26-4493 


Provides specific guidance for using the 3380 in a VM operating environment. 
The book provides detailed instruction for planning the addition of new 3380 
devices, installing devices, moving data to new devices, and performing 
ongoing storage management activities to maintain reliable performance and 
availability. In addition, this manual discusses storage considerations related to 
guest systems. 


e Using the IBM 3380 Direct Access Storage in a VSE Environment, GC26-4494 


Provides specific guidance for using the 3380 in a VSE operating environment. 
The book provides instruction for planning the addition of new 3380 devices, 
installing devices, moving data to new devices, and performing ongoing storage 
subsystem management. 


¢ IBM 3380 Direct Access Storage: Reference Summary, GX26-1678 


Provides a summary of 3380 capacity, performance, and operating 
characteristics in a portable, compact card form. 


Storage Subsystem Library Shared Publications 
The following publications contain information relevant to the entire SSL. 


Maintaining (2M Storage Subsystem Media, GC26-4495 

Describes how the storage subsystem and the various operating systems handle 
disk storage errors and provides instruction on using the EREP program and the 
Device Support Facilities (ICKDSF) program to diagnose and correct disk media 
errors. Recovery procedures are provided for the various device types. In 
addition, background material on DASD concepts is included. : 


e Storage Subsystem Library Master Bibliography, Index, and Glossary, 
GC26-4496 


Provides a central source for information related to storage subsystem topics. 
Books for IBM 3390 DAS, IBM 3380 DAS, and 3990 Storage Controls are indexed 
in this publication. The manual also includes an overview of the material in the 
Storage Subsystem Library. 


Storage Subsystem Library Ordering Information 


You can order the entire SSL or parts of it tailored to your hardware and software 
environment with bill of form numbers. 
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3390 and 3990 Publications 


You can obtain a copy of every manual in the 3390 and 3990 subsets of the SSL with 
one order number, GBOF-3124. Select one of the following bill of form numbers to 
obtain information tailored to your hardware and software environment. To obtain 
an individual manual, use its order number. 


MVS VM VSE 3990 SSL 
GBOF- GBOF- GBOF- GBOF- GBOF- 
Title 3121 3122 3123 0366 3124 
IBM 3390 Direct Access Storage Introduction, GC26-4573 a a ae ae eee ee ee 


- Using IBM 3390 Direct Access Storage in an MVS 
Environment, GC26-4574 


Using IBM 3390 Direct Access Storage in a VM 
Environment, GC26-4575 


Using IBM 3390 Direct Access Storage in a VSE 
Environment, GC26-4576 


Maintaining IBM Storage Subsystem Media, GC26-4495 


< 


IBM 3390 Direct Access Storage Reference Summary, 
GX26-4577 


IBM 3990 Storage Control Introduction, GA32-0098 


IBM 3990 Storage Control Planning, Installation, and 
Storage Administration Guide, GA32-0100 


IBM 3990 Storage Control Reference, GA32-0099 
Cache Device Administration, GC35-0101 
IBM 3990 Operations Study Guide, GA32-0133 


IBM 3990 Operations and Recovery Reference, 
GA32-0133 


Introduction to Nonsynchronous Direct Access Storage 
Subsystems, GC26-4519 


Storage Subsystem Library Master Bibliography, Index, 
and Glossary, GC26-4496 


Binder and 3390 inserts, GX26-3777 
Binder and 3990 inserts, GX26-3768 


Figure 2. 3390 and 3990 Subsets 
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3380 and 3990 Publications 


You can obtain a copy of every manual in the 3380 and 3990 subsets of the SSL 
using one General Bill of Forms (GBOF) number, GBOF-1762. Select one of the 
following bill of form numbers to obtain information tailored to your hardware and 
software environment. To obtain an individual manual, use its order number. 


. MVS CJ2/MVS | CJ2/VSE 3990 
; GBOF- aecr: GBOF- | ' GBOF- 
Title 1756 1758 0366 
IBM 3380 Direct Access Storage 
Introduction, GC26-4491 
{BM 3380 Direct Access Storage Direct 
Channel Attach Model CJ2 Introduction 
and Reference, GC26-4497 
Using IBM 3380 Direct Access Storage in 
an MVS Environment, GC26-4492 
Using IBM 3380 Direct Access Storage ina X X 
VM Environment, GC26-4493 
Using [BM 3380 Direct Access Storage ina x Xx 
VSE Environment, GC26-4494 
Maintaining IBM Storage Subsystem | 
IBM 3380 Direct Access Storage 
Reference Summary, GX26-1678 
IBM 3990 Storage Control Introduction, 
GA32-0098 


IBM 3990 Storage Control Planning, 
Installation, and Storage Administration 
Guide, GA32-0100 


IBM 3990 Storage Control Reference, Xx 
oer ge | 


Cache Device Administration, | Cache Device Administration, GC35-0101 | 


IBM 3990 Operations Study Guide, 
GA32-0133 

IBM 3990 Operations and Recovery 
Reference, GA32-0131 

Introduction to Nonsynchronous Direct 
Access Storage Subsystems, GC26-4519 


Storage Subsystem Library Master 


Bibliography, Index, and Glossary, 
GC26-4496 


| Binder and 9380 inserts, axae-a7e7 |x| x |x Tx | 
Ban nin a SE EE, SONNY SEE IE: EG 


Figure 3. 3380 and 3990 Subsets 


10 Maintaining IBM Storage Subsystem Media 


Storage Subsystem Library Binders 
You can organize your library with binder kits. Kits consist of a binder with 
identifying cover and spine inserts for 3390, 3380, or 3990 manuals, and are included 
when you order the following numbers: 


GBOF-3121 through GBOF-3123 include a binder with 3390 inserts. 
GBOF-1756 through GBOF-1761 include a binder with 3380 inserts. 
GBOF-0366 includes a binder with 3990 inserts. 

GBOF-3124 includes binders and inserts for both 3990 and 3390. 
GBOF-1762 includes binders and inserts for both 3990 and 3380. 


Binder kits may also be ordered separately. 


Order number GX26-3777 contains a binder and 3390 inserts. 
Order number GX26-3767 contains a binder and 3380 inserts. 
Order number GX26-3768 contains a binder and 3990 inserts. 


Related Publications 


This manual is intended for use in conjunction with the following publications. 


Error Recovery Information 
IBM 3990 Operations and Recovery Reference, GA32-0133 


Provides guidelines for problem determination, testing of 3990 extended 
functions, and recommended recovery actions. 


Device Support Facilities 
Device Support Facilities User’s Guide and Reference, GC35-0033 


Contains information on command syntax, parameters and use of media 
maintenance functions for all IBM DASD types. This manual describes 
differences in executing maintenance operations in supported operating 
environments. | 


Device Support Facilities: Primer for the User of IBM Direct Access Storage, 
GC26-4498 


Provides an overview of the Device Support Facilities product and its intended 
use and capabilities with the IBM 3380 and 3390 family of direct access storage 
devices. 


EREP information 


Environmental Record Editing and Printing Program User's Guide and 
Reference, GC28-1378 


Provides information on how to obtain reports needed for routine maintenance 
and error handling. The EREP manual explains how to designate a physical ID 
that the EREP program can use and what unique device information you may 
need. This manual provides instructions for obtaining and reading the System 
Exception Reports. 


EREP Release Level 


The examples presented in this manual assume you are operating on EREP 
Version 3.4.1 or higher. 


Preface 11 


e 3380 Information on Moving Data 


_ Occasionally it is necessary to move data to another volume before performing 
media maintenance activities. Refer to the appropriate Storage Subsystem 
Library operating environment manual for your device type, listed in “The 
Storage Subsystem Library” on page 3 for further references on copying data | in 
the MVS, VM or VSE environment. 


e Storage Control Information 


The following manuals provide information on error recovery procedures for the 
3880 Storage Control with descriptions of sense information and formats: 


— IBM 3880 Storage Control Models 1, 2, 3 and 4 Description Manual 
— IBM 3880 Storage Control Model 11 Description 
— IBM 3880 Storage Control Model 13 Description 
— IBM 3880 Storage Control Model 21 Description 
— IBM 3880 Storage Control Model 23 Description 


See the appropriate 3390 manual listed in Figure 1 on page 5 for information on 
error recovery procedures and descriptions of sense information and formats 
for 3990 storage controls. 


Other publications referenced in this manual that may provide additional related 


information are included in the “Bibliography” on page 121. The bibliography 
includes a short description of each publication. 
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Summary of Changes 


Third Edition, September 1990 


Technical additions include: 


| ¢ New information regarding ICKDSF Release 12, primarily related to directed 1/O 
| for dual copy. | 


| ¢ VSE/SP Version 4, Release 1.2 support for IBM 3390 Direct Access Storage. 

| e Updates to media maintenance procedures in Appendix B. 
In addition to the technical changes listed above, the following non-technical change 
has been made: 


| e The information presented in this book has been reorganized to separate 
| specific SIM and non-SIM information. This change will help to focus on new 
| information for future products, as it becomes available. 


A vertical bar in the left margin indicates a specific change to the text. Vertical bars 
do not appear next to editorial changes that have no technical significance. 
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Chapter 1. Introduction 


Knowing what an error is and how to handle an error is important to maintaining 
your IBM’ storage subsystem. This book provides information about identifying 
| types of errors and provides guidance in handling errors that can occur on the 
| storage media, or disk surface. You need to know the types of errors that can occur, 
| and how the error type should be handled. 


Characteristics of an Error 


This section explains the various attributes of an error situation. Depending on 
these specific attributes, or characteristics, the subsystem and system handle the 
—error by employing specific techniques. The following attributes, described in this 

section, must be considered when evaluating any error situation: 


e Type 
e Recoverability 


Additional characteristics and considerations of special interest for handling the 
data check type of errors are described in “Handling Data Checks” on page 17. 


Types of Errors 


The categories for describing disk storage errors are: data check, programming 
check, overrun, and equipment check. The category, or type, is indicated in the 
sense information, console messages, and reports described in this book.. An error 
type is similar to a symptom; it is evidence of a problem, but does not necessarily 
reveal either the source or the cause. 


Data Check: A data check is an error detected in the bit pattern read from the disk. 
Some data checks are caused by hardware, some are caused by media, and others 
are the result of random events such as transient electrical interference. A data 
check can occur as a result of: 


A defect on the surface of the media 

An error when writing the data 

A hardware error when reading the data 
A random event. 


Programming Check: A programming check, such as an invalid track format or 
incorrect record specification, causes a unit check. This type of error is indicated to 
either the system or subsystem, and is always returned to the requesting program. 


Overrun: An overrun is a condition that exists because the data cannot be received 
at the rate it is transmitted. It is usually the result of timing and usually will not 
recur if the I/O operation is retried. 


Equipment Check: An equipment check is an error detected in mechanical or 
electrical operation of the hardware. For the 3390, error conditions that cause seek 
checks in other DASDs are presented as equipment checks. 


* 


IBM is a trademark of the International Business Machines Corporation. 
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Understanding Error Recoverability 


When a data or equipment error is detected by the subsystem, either the subsystem aan 
_or the operating system will attempt recovery, depending on the situation and the q 


type of hardware involved. The terms temporary and permanent refer to the 
recoverability of the error. 


Temporary: An error is temporary if subsystem or system error recovery 
procedures are successful. A temporary error is only seen by the subsystem or 
system, and is never returned to the application. 


Terminology 


The term error recording data set (ERDS) is used in this manual to refer to the 

- area in which error records are logged for any operating environment. For 
example, ERDS information is stored in SYS1.LOGREC by MVS, in SYSREC by 
VSE’ , and in the error recording area by VM. 


Some DASD types log all temporary errors to the ERDS. However, for the 3390, 
errors recovered by the subsystem are not logged at the host system. They are 
logged only when the subsystem requires the assistance of the system error 
recovery procedures. When necessary, the subsystem will generate a SIM, and 
send it to the operating system. 


Permanent: The term permanent has meaning from two different perspectives. 
e System view 


From the system perspective, an error is permanent if error recovery 
procedures performed by the operating system or storage subsystem cannot 
recover from the error condition. 


¢ Application view 


An error is permanent from an application perspective if an error indication 
must be returned to the application. The application is then responsible for 
determining how to deal with the error. 


A data check may be permanent to the system because it cannot be recovered on 
that path. However, when system error recovery procedures retry the operation 
from an alternate path, the operation may complete successfully. The application 
program would not be notified of the error, so the error would not be permanent to 
the application. On the other hand, if the system error recovery procedures are not 
successful, the data check is permanent from both the system and application points 
of view. 


Throughout this manual, permanent refers to errors from the system view, unless 
otherwise noted. 


* 


-VSE is a trademark of the International Business Machines Corporation. 


16 Maintaining IBM Storage Subsystem Media 


Recoverability by Error Type 


An error is recoverable if it is not seen as a permanent error by the application. 


| Data Check: When data is written, additional control information is recorded along 

| with the data in order to enable data verification when the data is being read. This 

| control information is the error checking and correcting (ECC) bytes. Along with the 
ability to detect the data check, ECC bytes can provide sufficient information to 
reconstruct the data in error. When this reconstruction occurs, the data check is 
designated ECC-correctable by the subsystem. ECC-correctable data checks are 
corrected either by the storage subsystem or by the operating system error 
recovery procedures. ECC-correctable data checks are always recoverable. 


When the ECC bytes are insufficient to reconstruct the data, the data check is 
designated ECC-uncorrectable by the subsystem. Then, either the storage 
subsystem or operating system will retry the I/O operation. If the retry is 
unsuccessful, the data check is unrecoverable. This is known as a permanent error. 


The techniques used to handle data checks and the role played by the subsystem 
and system differ for different DASD types. 


Overrun Errors and Equipment Checks: When an overrun or equipment check is 
detected, the operation is retried a specific number of times, depending on the 
DASD. If the operation is successful, the error is recoverable. If retry is not 

| successful within that number of retries, the error is recorded as permanent. 


Handling Data Checks 


Media maintenance is not required for all data checks. Only the data checks that 
have an effect on your data require media maintenance actions. Understanding 
some additional attributes of data checks, namely the repeatability, visibility, and 
source is helpful when you need to analyze a data check. With the 3390, the 
subsystem analyzes the errors for you and notifies you when a condition exists that 
requires you to take corrective action. 


Repeatability and Visibility of Data Checks 


Every error has a certain degree of repeatability. The repeatability determines the 
probability of an error being detected for any given read operation. For example, if 
a data check is 1% repeatable, 99 times out of 100 the data is read error free. 
Conversely, if a data check is 99% repeatable, 1 time out of 100 the data is read 
error free. : 


A data check can occur for defects smaller than the area of a single bit on the 
surface of the media. Because these defects are so small, data that is rewritten has 
the potential to “straddle” the defect and prevent subsequent reads from detecting 
an error. Such an error has low repeatability. 


The percentage of time that a data check is detected after multiple write operations 
determines its visibility. 


| | These attributes can be useful when you need to determine whether corrective 
| action should be taken immediately, or scheduled for a time that will minimize the 
| impact to your operations. 
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Source of Data Checks 


You should correctly identify the source of an error to determine whether you can 
handle the error situation yourself, or if you will require the assistance of an IBM 
service representative. Errors that appear to be media-related can actually be the 
result of a hardware problem. For example, the source of a data check may be: 


¢ Aslight misalignment of the head with the center of the track. This isa 
hardware problem that you can rectify with media maintenance procedures. 


e Atransient electrical interference. This is a random event and will probably not 
recur. | 


e An imperfection on the disk surface. This is a media problem for which you can 
perform maintenance. 


e Controller or device errors. Assistance of an IBM service representative is 
required. 


Determining the source of an error, and then the exact cause, can require analysis 
of sense information and other diagnostics, such as Device Support Facilities. 


Identifying the Source with SIM DASD: Some DASD types monitor data checks and 
other events taking place in the storage subsystem and thoroughly analyze this 
information for you. When a given event continues to recur, or is serious enough to 
require attention, the subsystem produces a SIM. The SIM is formatted by the 
system error recovery procedures, and sent to the operator console to notify the 
operator that a SIM has been generated. The console SIM provides the following 
information: 


e Whether the source of the problem is media or hardware related. 

e An indication of the severity of the SIM being reported. 

e lf the source of the problem is media related, the media maintenance procedure 
number that you need to perform. 

e The serial number of the failing volume. 


The EREP program produces two reports that contain similar information found in 
the console SIM message. For more information about SIMs and EREP, see 
“Chapter 2. The Error Handling Process” on page 21 and “Chapter 3. Performing 
Media Maintenance on SIM DASD” on page 27. For detailed information on the 
Device Support Facilities media maintenance functions, see “Chapter 5. Device 
Support Facilities” on page 57. 


Determining the Source with non-SIM DASD: For DASD that do not produce SIMs, 
determining the source of an error, and then the exact cause, requires analysis of 
sense information and possibly other diagnostics such as Device Support Facilities. 
There are several System Exception Reports produced by EREP that provide the 
information you need to complete the analysis of the error. For hardware errors, 
the IBM service representative uses EREP reports and other tools to further isolate 
the cause of the error and determine the required repair action. The probable 
source of an error, as defined by the EREP program, is referred to as a probable 
failing unit (PFU) in the System Exception Reports. 


If the PFU is a channel, storage control, controller, or device, then the source of the 
error is defined as the hardware. The error might be detected during a write, read, 
or control operation. Hardware errors usually require the attention of an IBM 
service representative. | 
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When the cause of the problem is determined to be hardware, it is usually possible 
ves to replace the failing component, which might be a logic card, power component, or 
other field replaceable unit (FRU). | 


If the PFU is a volume, the source of the error is defined as media, or something 
associated with writing or reading data on the disk media. The error is detected 
during reading of data from the disk and is called a data check. You can use Device 
Support Facilities when media maintenance actions are required. 


Refer to “Chapter 2. The Error Handling Process” on page 21 for an overall 
description of the error handling process and to “Chapter 4. Performing Media 
Maintenance on Non-SIM DASD” on page 43 for more information on EREP System 
Exception Reports and non-SIM DASD. For detailed information on the Device 
Support Facilities media maintenance functions, see “Chapter 5. Device Support 
Facilities” on page 57. 


Figure 4 summarizes some of the effects of data checks occurring in the various 
storage subsystem components. 


Failing 
Component Effects of the Failure | 


Channel Availability of data on all connected DASD strings may be affected. If the error is permanent, data is unavailable 
| or through the path associated with the failing component until the problem is corrected. 
Storage 


Control If the storage subsystem configuration includes alternate paths, data can continue to be written and read through 


another path. 


If the error is permanent, service for the failing unit is required to restore use of the path. 


The data on the volume is accessible during the repair action. 


Controller Availability of data on all volumes in the string may be affected. If the error is permanent, data is unavailable 
| through the failing controller until the problem is corrected. 
If there is an alternate controller (3350, 3375, 3380, 3390), data can continue to be written and read through 
another controller. | | 
If the error is permanent, service is required to restore the controller to use. 

The data on the volume is accessible during the repair action. 

Device Availability of data on the volume(s) associated with the failing device is affected. If the error is permanent, data 

| is unavailable from the affected volume(s) until the problem is corrected. 

if the error is permanent, service for the probable failing unit is required to restore its use. 

The data on the volume is not accessible during the repair action. 


A specific portion of data on volume may be damaged. 

If the error is temporary (because it was recovered by the subsystem or system), the data remains available. If 
the error is permanent, the affected data on the track is no longer available. 

Media maintenance performed with the Device Support Facilities can remedy the cause of the error in most 


cases. With concurrent media maintenance, it is possible to recover from the error while retaining access to the 
data on the volume. 


Figure 4. Effects of Failures | 


For more information on concurrent media maintenance, see “Understanding 
Concurrent Media Maintenance” on page 26 and “Using Concurrent Media 
Maintenance” on page 58. | 
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Impact of Data Checks. 


When a data check occurs, the error could affect: 


e The operation in progress 

¢ Continued availability of the data 
¢ Continued processing 

e The data itself 


It is essential to quickly determine the severity and impact of data checks at your 
data processing center and to correct them. Data checks caused by media errors 
that are affecting your operation can be corrected using the media maintenance 
services provided by Device Support Facilities. For detailed information on the 
Device Support Facilities media maintenance functions, see “Chapter 5. Device 
Support Facilities” on page 57. 
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Chapter 2. The Error Handling Process 


Maintaining required levels of availability and performance for your data processing 
system depends upon your recognizing and handling errors that can occur during 
disk storage operations. Normal processing by the storage subsystem (that is, the 
storage control and its attached storage devices) and by the operating system (MVS, 
VM, or VSE) include various functions that recognize errors and recover from them 
whenever possible. For input/output (I/O) activity, descriptive records of errors may 
be generated and recorded for future analysis. The storage subsystem and the 
operating system, communicating with each other, recognize and usually recover 
from errors before you know they exist. 


This chapter gives a general description of the respective roles of the storage 
subsystem and the operating system in the automatic handling of errors followed by 
a section that describes regular backup and recovery procedures that can help 

| minimize disruption to your operation. In addition, this chapter provides a brief 

| section on performing media maintenance, including a description of concurrent 

| media maintenance. 


Overview 


Maintenance facilities provided by the storage subsystem and the operating system 
help to ensure reliable disk storage operations. Maintenance functions include: 


Notification of conditions that require corrective action (SIMs) 

Statistical counts, such as number of bytes read (MDRs) 

Generating records that describe error activity (CCHs/SLHs, OBRs, MIHs) 
Detecting errors that occur and automatically recovering when possible with 
error recovery procedures 7 


Each storage subsystem performs some level of error correction activity. The 
subsystem may send sense information to a host to request recovery actions. In 
addition, this same sense information may request logging of error or statistical 
information, if applicable. Refer to “The Role of the Storage Subsystem” on 
page 22 for a more complete description of sense information. 


If an error is detected during execution of a channel command, the subsystem 
activates its error recovery procedures. If subsystem error recovery is . 
unsuccessful, the system error recovery procedures are activated. 


As part of the recovery process, the subsystem can attempt to read the data by 
offsetting the access mechanism to the left and to the right of the track center. The 
offset recovery process often results in a successful read of the data. 


The remainder of this chapter covers the following topics: 


e The respective roles of the storage subsystem and the operating system in 
handling errors 

¢ Routine practices that can help in the error recovery process 

¢ The benefits of performing media maintenance and concurrent media 
maintenance. 
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The Role of the Storage Subsystem 


The storage subsystem, consisting of the storage control, controller, and disk 
storage, participates in monitoring I/O activity and error recovery during disk 
storage operations. (Refer to Appendix A, “Background Information” on page 71 
for more information on the subsystem components.) 
The subsystem performs the following functions: 

¢ Adds ECC information to each field of a record when it is written. 


e Detects errors in reading data, in performing control operations, in functioning 
of the hardware, and in programming. 


e Retries I/O operations for certain error conditions. 
e Assembles usage and error information in the form of sense information. 


e Maintains counts of disk storage usage factors (such as seeks and number of 
‘bytes read), and for some disk storage, maintains counts of errors. 


e Some subsystems perform ECC correction activity. Other subsystems send the 
ECC data to the operating system for correction. 


Subsystems that generate SIMs perform the following SIM generation-related tasks, 
as well as the functions listed above: 


e Log all errors internally. 
e Determine if SIM reporting is necessary. 
e Send the resulting information to the system for presentation of a SIM. 


To supplement your understanding of error handling, these topics are discussed in 
more detail in the remainder of this section. | 


Sense Information 


Sense information is data that is sent from the subsystem to the operating system 
when a condition occurs that requires logging to the ERDS or when recovery actions 
are required by the operating system. These sense bytes identify the conditions that 
caused the interruption, indicate the type of error, and provide information about 
where the error was detected. The sense bytes could provide further information for 
system error recovery procedures and for diagnosing and isolating the cause of an 
error condition. | 


The following conditions will cause the subsystem to return sense information to the 
operating system: 
e An error condition that could not be recovered by subsystem retry 


e An error condition that was successfully recovered by the subsystem, but is to 
be logged by the operating system 


e An error condition that the subsystem does not retry and presents to the 
operating system to recover | 


e Filled counters of usage or errors that need to be sent to the operating system 
_ for logging 


e The storage subsystem determines there is a need for service 
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Subsystem Counting and Logging 


A buffered log is kept in the storage control for each disk storage device that 
attaches to it. The log may contain counts of the seeks made, the bytes read, and 
| overruns. This data may be used by the operating system when performing analysis 
| functions. 


For some DASD, counts of data checks and seek checks are also kept. 


| For the 3390, all subsystem activity is buffered in the subsystem and analysis of 
| this data is done at the subsystem level. A SIM is generated and logged in 

| ERDS if any abnormal condition is detected. The SIM provides sufficient 

| information regarding media maintenance and repair actions. Statistical 

| records (MDRs) are also sent to the operating system to be logged in ERDS. 


| For all other devices, EREP analyzes the data in the ERDS and then determines if 
| media maintenance or repair actions are required. In some cases, additional data 
| may be needed to complete the analysis. For more information on how to acquire 
| additional data, see “The Steps in Handling Errors on Non-SIM DASD” on page 53. 


For the 3380, 3375, 9332, and 9335, and for the 3370 attached to a 3880, seek 
| checks and data checks are monitored by the subsystem. If either reaches an 
| abnormal level, according to the subsystem standards, sense information for a 
| preset number of subsequent errors is sent to the system for logging. 


For the 3330/3333 and for the 3370 attached directly to a processor, counts are 
kept of all data checks and seek checks and are sent to the system. 


For the 3350, counts are kept of retried data checks and seek checks and are 
sent to the system. 


For the 3340/3344, no count is kept of data or seek checks. 


The Role of the Operating System Error Recovery Procedures 


The operating system provides standard error recovery procedures to handle errors 
detected by the storage subsystem. 


The system error recovery process: 


| ¢ Implements recovery actions 
fs e Logs usage and error information records 
e Issues system messages at the operator console. 


More information on each of these functions is included in the remainder of this 
chapter. In addition, the EREP reports that are used in the media maintenance 
process are described. These reports are generated from the contents of the ERDS 
maintained by the system. 


System Recovery Actions 


A specific recovery action by the system is based on a particular error condition that 
was defined in the sense information sent from the subsystem. For example, 

system recovery actions include retrying an operation when an equipment check is 
reported in the sense information. 
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System Logging 


System Console 


Error data that is sent to the system is stored in the ERDS. The process of recording 
information in the ERDS is called logging. 


For logging purposes, the system processes the sense information to produce and 
supplement data records describing the conditions under which the error occurred. 
The operating system processes three types of data records for disk storage; all are 
based on sense information supplied by the subsystem. The three kinds of data 
records are: 


e Usage and error counts 
e Synchronous events 
e Asynchronous events. 


The EREP program formats error reports and may also perform error analysis 
based on information it obtains from the ERDS. (See “Using EREP to Generate 
Reports” on page 25.) 


Messages 

System console messages may be the initial notification of hardware or media 
problem. Most information messages contain data on the type and location of an 
error, and give sense information in hexadecimal format. Generally, console 
messages can be associated with information in the ERDS. The appropriate EREP 
reports should be run for a more detailed analysis of the problem. 


For information on system console messages for SIM DASD, see “Chapter 3. 
Performing Media Maintenance on SIM DASD” on page 27. For information on 
system console messages for non-SIM DASD, see “Chapter 4. Performing Media 
Maintenance on Non-SIM DASD” on page 43. 


Routine Practices 


Certain routine practices can help to ensure error recovery with minimal disruption 
to processing and to the user community. First of all, your normal backup and 
recovery procedures must be effective and adequate to meet your needs if data is 
damaged or lost for any reason, including disk media problems. In addition, the 
appropriate EREP reports should be generated on a regular basis to help you 
identify errors and provide a history of problems related to disk storage. 


Backing up Volumes 


The way you control and manage backup procedures must be tailored to suit the 
needs of your applications and users. It is important to back up data at the correct 
frequency, select an appropriate backup medium, and maintain an adequate. 
number of backup versions. Several of the major considerations for defining the 
backup procedures can affect the ease with which error recovery takes place. For 
example: 


e The rate at which data changes 
e The time and resources required for synchronizing and updating backup copies 
e Options for rebuilding, as opposed to re-creating data from backup versions. 


In assessing the adequacy of your backup procedures, you should also consider 
factors such as how critical the data is to the business and what the time 
requirements are in your applications. 
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For more information on backup and recovery applicable to your device type and 
operating environment, refer to appropriate operating environment manual listed in 
the “Storage Subsystem Library” sections of the “Bibliography” on page 121. 


Using EREP to Generate Reports 


EREP is an IBM product that helps you to monitor the functioning of various units in 
your system, such as the processor, disk storage, tape drives and other I/O devices, 
controllers, and channels. This diagnostic aid provides you with information on 
errors that have occurred in the components of your system. EREP helps you to 
identify units that may be malfunctioning or exceeding predefined error limits. By 
taking prompt corrective action, you can help to achieve continued efficient 
operation. 


lf an error occurs, the operating system creates a record from data provided by the 
hardware or software and writes it in the ERDS, in accordance with the error 
recording requirements for the DASD type. EREP processes these records and 
produces reports, according to your specifications. These reports are designed for 
use by you, or by the service representative who may be called for certain error 
conditions. 


Traditionally, these reports have been used primarily by an IBM service 
representative. However, you can use the EREP reports to help you determine the 
nature of error incidents and make decisions regarding media maintenance actions. 
These reports include explicit information on disk storage errors. 


The EREP reports describe the type and location of errors and give other needed 
details. If you are running EREP and consolidating from all of the systems that 
share DASD, the EREP reports will give you information from all the systems that 
share the disk storage. The ERDS for each system should be used as a basis fora 
multi-system report. These reports also include temporary and permanent errors, 
and suggest the probable failing unit. Refer to “Chapter 1. Introduction” on | 
page 15 for information on error classification considerations, such as “temporary 
versus permanent” and “probable failing unit.” 


You should run these reports daily as a normal procedure, and a member of the 
data processing center staff should be responsible for reviewing the reports for 
actual or potential disk storage problems. 


“Using EREP Reports for SIM DASD” on page 33 provides specific instructions for 
reading EREP reports pertaining to SIMs. 


For non-SIM DASD, “Using the System Exception Reports” on page 44 provides 
specific instructions for reading the System Exception Reports and “The Steps in 
Handling Errors on Non-SIM DASD” on page 53 explains how each of these reports | 
is used along with Device Support Facilities for handling an error situation. 


Refer to EREP User’s Guide and Reference for a complete description of the 
program and its functions. 
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| Performing Media Maintenance 


Performing media maintenance actions on a timely basis will help insure that 
performance and availability of your data processing system are maintained. 


The exact procedures that you implement for handling and scheduling media 
maintenance activities must be tailored to suit the performance and availability 
requirements of your user community. It is a good idea to document your 
procedures for handling disk storage errors along with your other data processing 
operational procedures. 


If your media maintenance actions do not correct the problem, call your IBM service 
representative for assistance. 


Understanding Concurrent Media Maintenance 


Concurrent media maintenance is a capability that allows media maintenance to be 
performed on a track while access to the data from that track continues 
concurrently. The entire volume, including the data on the track being repaired, is 
available for use by all users from all systems that share the volume. The capability 
is designed to address media problems that have resulted in temporary or 
correctable data check events. 


To allow concurrent access, the track to be worked on is copied to an alternate track 
on the same volume. The 3990 directs I/Os for the original track to the alternate 
track for the duration of the Device Support Facilities operation on the original track. 
No loss of data access is incurred. When the operation is complete, the data is 
copied back to the original track and the alternate track is released. 


Concurrent media maintenance is offered with the 3990 Models 2 and 3 in 
combination with Device Support Facilities (Release 11 and subsequent releases), 
3990 features, and operating system support. Both 3380 and 3390 DASD are 
supported. For more information on performing concurrent media maintenance with 
ICKDSF, see “Using Concurrent Media Maintenance” on page 58. 


For information on how concurrent media maintenance works with SIM DASD, see 
“The Steps in Handling a SIM” on page 36 and “Automating the Steps in Handling a 
SIM” on page 38. 


For information on how concurrent media maintenance works with non-SIM DASD, 
see “Using Concurrent Media Maintenance on Non-SIM DASD” on page 43. 
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Chapter 3. Performing Media Maintenance on SIM DASD 


This chapter describes how SIMs produced by the 3990-3390 subsystem help you 
determine when media maintenance is necessary and what Device Support 
Facilities functions are required to perform it. Media maintenance techniques are 
centered around the use of the SIM for the 3390. SIMs appear on the operator 
console and are listed in the Service Information Messages report, a report within 
the System Exception Report series, and Asynchronous Notification Record Detail 
report produced by EREP. 


It is important to note that this section describes SIMs that relate to the DASD 
portion of the 3990-3390 subsystem only, particularly those that relate to media 
maintenance. For a complete description of SIMs that relate to the 3990, see the 
IBM 3990 Storage Control Planning, Installation, and Storage Administration Guide 
and the /BM 3990 Storage Control Reference. 


Before reading this chapter, you should have a basic understanding of error 
characteristics, how the storage subsystem and operating system contribute to the 
error handling process, and a basic understanding of EREP functions. This 
information is described in detail in “Chapter 1. Introduction” on page 15 and 
“Chapter 2. The Error Handling Process” on page 21 


Using the Service Information Message (SIM) 


A SIM is a Summary message prepared and sent to a host asynchronous to any 
given error event or events. It reflects the result of error event collection and 
analysis in the 3990/3390 storage subsystem and indicates that some kind of service 
action needs to be taken. 


The SIM message contains a description of the problem the subsystem has 
identified (the impact of failure), a description of what resources the repair action 
will affect (the /mpact of repair), and defines what action should be taken to resolve 
the problem. 


With the introduction of the 3990/3390 storage subsystem, error collection, 
consolidation, filtering, and analysis is handled within the subsystem. The 
subsystem’s SIM problem analysis programs are more comprehensive than the host 
EREP problem analysis capability used in support of earlier DASD subsystems. 


SIMs eliminate the need for customers to perform problem determination activities. 
They also greatly reduce problem re-creation activities by the service 
representative that are often required with earlier DASD subsystems. The SIM 
contains complete information for the service representative including coded 
information that identifies the FRU that should be replaced. The SIM also provides 
improved information for customers in planning for the service action. 


Identifying the Need for Service 
There are several ways to identify that service is required: 


e The SIM Alert console message is usually the first indication that service may 
be needed. 


e The Service Information Messages report produced by EREP provides a listing 
of SIMs currently in the EREP history file. 
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The Service Information Messages report produced by EREP, is the most complete 
means to verify that a service action is required, and is a useful tool in following up 
on service needs brought to your attention in other ways. The EREP Asynchronous 
Notification Record Detail report is also useful when information regarding a 
specific SIM is needed. 


Types of SIMs for DASD Devices 


There are two types of SIMs that provide service information for the 3390: 


DASD SIM _ Indicates a hardware condition exists. This condition is not media 
related and requires the attention of an IBM Service Representative. 
A DASD SIM causes a DASD ALERT message to be sent to the console 
and a SERVICE ALERT to appear on EREP reports that provide SIM 
information. 


Media SIM Indicates a media condition exists. The customer should perform the 
recommended media maintenance action. A media SIM causes a 
MEDIA ALERT message to be sent to the console and a MEDIA ALERT 
to appear on EREP reports that provide SIM information. 


Terminology 


The term SIM Alert is used where information pertains to both MEDIA ALERT 


and DASD ALERT console messages. 


On a 3990-3390 subsystem, a media SIM is generated when the source of the failure 
is the volume. A SIM Alert presented at an operator console, provides valuable 
information regarding source and severity of the problem. 


The Service Information Messages report and Asynchronous Notification Record 
Detail report produced by EREP, contain similar information to that found in the SIM 
Alert, including the track address where the failure occurred and the action required 
to correct it. 


Using the Console SIM Alert 


If the subsystem determines that a service action is needed, a SIM is logged in 
ERDS and a SIM Alert message is routed to the operator’s console. 


A MEDIA ALERT presented at the operator’s console, provides the volume serial 
number of the failing volume and the address of the failing track. The MEDIA 
ALERT also provides the Device Support Facilities procedure number required for 
media maintenance. The severity field allows you to determine if the condition 
needs immediate attention or if it is possible to defer the action. For a more 
detailed description of the SIM severity field, see “SIM Severity Levels” on page 31. 


Depending on your operating system, operator console messages may go unnoticed 
because they are displayed for such a short time or they ro// off the screen. When 
possible, SIM Alerts are highlighted and issued so as not to roll off the screen to 
encourage immediate operator attention. In any event, you should continue to run 
EREP on a routine basis. 


Some SIM Alerts can be suppressed using the SIM Severity Reporting Option in the 
3990 VPD. See /BM 3990 Storage Control Planning Installation and Storage 
Administration Guide, for more details on this option. 
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| The following sections describe how to interpret the console SIM Alert. 


Console SIM Alert in MVS and VM 


The following example shows a SIM Alert formatted to the system operator’s 
console. This MEDIA ALERT was generated in the MVS/XA* operating environment, 
for a 3390 Model 2. 


The following example shows the components of a SIM Alert and the possible fields 
that can occur in the SIM Alert message for the 3390. 


IEA480E yyyy |DASD, {ACUTE ALERT, MT=machine type/model, SER=MMPP-SSSSS 


DMKmmm4031 MEDIA, | SERIOUS 
HCPERP4031 MODERATE 
SERVICE 


REFCODE=nnnn-nnnn-nnnn, VOLSER=volser, ID=id, cchh=x'cccc hhhh', REPEATED 


Figure 5 describes the fields in the 3390 SIM Alert.’ 


Figure 5 (Page 7 of 2). Description of the 3390 SIM Alert Fields 


IEA480E The 'IEA' message is received for SIM Alerts on MVS 
systems. 


DMKmmm403! The 'DMK' message is received for SIM Alerts on VM/SP 
HPO systems, 'mmm' represents the module issuing the 
message. 


HCPERP4031I The 'HCP' message is received for SIM Alerts on VM/XA 
systems. 7 


Sywy The address of the device that reported the failure. 


Tells you the problem is hardware related and will require 
the attention of an IBM Service Representative. 


Tells you the problem is media related and will require you 
to perform a media maintenance action. 


ACUTE Represents the severity of the SIM being reported. Fora 
SERIOUS description of severity levels, see “SIM Severity Levels” on 
MODERATE page 31. 

SERVICE 


Machine type and model number (7 characters maximum). 


* MVS/XA is a trademark of the International Business Machines Corporation. 


* VM/XA is a trademark of the International Business Machines Corporation. 
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Figure 5 (Page 2 of 2). Description of the 3390 SIM Alert Fields 


MM identifies manufacturer (01 indicates IBM). 
PP identifies the manufacturing plant 
SSSSS is the five digit machine serial number. 


ol Twelve hex characters, tells IBM service representative 


information needed to repair the fault. For MEDIA ALERTs, 
the last character is the media maintenance procedure that 
you need to perform. 


VOLSER The volume serial number of the failing volume. 
iD | SIM ID, two hex characters. 


CCHH Appears for MEDIA ALERTs only. The cylinder and head 
address of the failing track. 

REPEATED Appears for DASD ALERTs only. This field is shown when 
the SIM is a repeat presentation of a previously reported 
SIM. 


| Console SIM Alert in VSE 

| A SIM Alert message has the same format as a unit check error message. The 
| 

| 


format of a SIM Alert message is shown in Figure 6. Relevant fields are 
highlighted. 


| Figure 6. Sample SIM Alert Message in VSE 


| where: 

| OPxxl Message ID 

| MSG DESCRP_ Message text 
| SYSO011=yyy  1/O address 

| SNS=nn...nn Sense bytes 


| The 32 sense bytes, shown in hexadecimal, are the data upon which the SIM is 

| based. The Service Information Messages report and the Asynchronous Notification 
| Record Detail report contain detailed information, including the severity, the track 

| address where the abnormal condition occurred, and the action required to correct 

| it. Therefore, you should continue to run EREP on a routine basis. Use this report 

| to get a complete listing of all SIMs generated by the subsystem. For further 

| information on Her PEOnnD SIM sense bytes, see /BM 3990 Storage Control 

| Reference. 


| | The following messages indicate that a SIM may have been received. The four SIM 


| Alert message IDs, message descriptions, and recommended actions are shown in 
| Figure 7. 
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oman 


Message ID Message Description Recommended Action 


OP04I PATH FENCE Generate the Service Information Messages report or 
the Asynchronous Notification Record Detail report by 
running EREP. Schedule system maintenance with 
your service representative. 


OPO5I OPER INFO Generate the Service Information Messages report or 
the Asynchronous Notification Record Detail report by 
running EREP. Schedule system maintenance with 
your service representative. 


OP64i MAINT REQD Generate the Service Information Messages report or 
the Asynchronous Notification Record Detail report by 
running EREP. Schedule system maintenance with 
your service representative. 


OP65I MEDIA ERR Generate the Service Information Messages report or 
the Asynchronous Notification Record Detail report by 
running EREP. You can fix most media errors using 
Device Support Facilities. 


Figure 7. SIM Alert Messages and Recommended Responses in VSE 


For more information, see Using IBM 3390 Direct Access Storage in a VSE 
Environment and IBM VSE/SP Messages and Codes. 


SIM Severity Levels 
SIM Alerts include a severity level which indicates if the service action can be 


deferred or if you need to take immediate action. The following table provides a 
brief description of SIM severity levels. 


Figure 8 (Page 1 of 2). General Description of SIM Alert Severity Fields for 3990 and 3390 


Severity | 
Field Meaning Recommended Action 


SERVICE No system or application performance At the earliest convenience, evaluate the 
degradation is expected in any environment. potential effects on system operations. Plan 
No system or application outage has to take corrective action, if necessary 
pesUlre’: If you defer action, application outages and/or 
unacceptable performance degradation may 
occur if previously recoverable exceptions 


become unrecoverable. 


MODERATE Performance degradation is possible ina Promptly evaluate the effects on system 
heavily loaded environment. No system or operations. Plan to take corrective action, if 


Not 

applicable 

for MEDIA If you defer action, application outages and/or 

ALERTs. unacceptable performance degradation may 
occur if previously recoverable exceptions 
become unrecoverable. | 


SERIOUS A primary I/O subsystem resource is Immediately evaluate the effect on system 
disabled. Significant performance operations. Plan appropriate system 
degradation is possible. System or recovery actions. 

BOPHCRHION OUIAOE May Neve/oceuned:. Product service and/or action by the 
| installation is required to restore the product 
to full operation. Determine the actions 


application outage has occurred. necessary. 


required. 
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Figure 8 (Page 2 of 2). General Description of SIM Alert Severity Fields for 3990 and 3390 


Severity 
Field Meaning | | Recommended Action 


ACUTE A major I/O subsystem resource is disabled, Treat as an emergency. Remove data from 
or damage to the product is possible. any DASD volumes implicated. Evaluate the 
Not 
Applieabie Performance may be severely degraded. current or potential effect on system and 
ni MEDIA System and/or application outages may have application operations. Determine 
occurred. appropriate system recovery actions or 
ALERTs. ; 
actions to prevent possible product damage. 


Product service and/or action by the 
installation is required to restore the product 
to full operation. Determine the actions 
required. 


Data restoration may be required to resume 
normal system or application operation. 


Figure 9 shows the relationship between the failing component and the meaning of 
the severity field in the SIM Alert message as it pertains to a 3390. 


Figure 9. Interpreting the Severity Field of the SIM Alert by Failing Component 


Failing 


_ Severity Field 


DASD Indicates degraded A permanent error Permanent error Permanent error 
DASD performance occurred on the path occurred on the occurred on the 
due to recurring to the device or device. Access to device. Potential 
recoverable controller. Service data is lost. Contact loss of data. Contact 
equipment/data can be deferred, but your IBM service your IBM service 
checks caused by a IBM service representative as representative as 
hardware fault. representative must soon as possible. soon as possible. 
Service can be handle error. 
deferred, but IBM 
service 
representative must 
handle error. 

- MEDIA Temporary data Not applicable for Permanent data Not applicable for 


checks have MEDIA ALERTs. 
occurred. Media 
maintenance action 
recommended for 
track addresses 
identified in SIM 
Alert console 
message or in the 
Service Information 
Messages report. 
Your service action 
can be deferred. An 
IBM service 
representative is not 
required. 
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check has occurred. MEDIA ALERTs. 
Media maintenance 
action recommended 
for track addresses 
identified in SIM 
Alert console 
message or in the 
Service Information 
Messages report. 
Respond to the error 
as soon as possible. 
An IBM service 
representative is not 
required. 


Using EREP Reports for SIM DASD 


The Service Information Messages report and Asynchronous Notification Record 
Detail report may be generated by running EREP. These reports include information 
such as the track address where the failure occurred and the action required to 
correct it. Use the Service Information Messages report to get a complete listing of | 
all SIMs currently in the EREP history file. The Asynchronous Notification Record 
Detail report is useful when information regarding a specific SIM is needed. 


For the 3390, it is not necessary to run EREP to get the information needed to 
perform media maintenance. The console SIM Alert contains similar information to 
that found in the EREP reports and is sufficient for performing media maintenance 
actions. 


EREP Release Level 


The examples presented in this manual assume that you are operating on EREP 


Version 3.4.1 or higher. 


Interpreting the Service Information Messages Report 


| The Service Information Messages report is a report within the System Exception 

| Report series that shows a formatted listing of SIMs currently in the EREP history 

| file, regardless of whether action was taken or not. Each SIM listed shows 

| necessary information on the type of error. In addition, for media SIMs, it provides 
you with the Device Support Facilities media maintenance procedure number 
required to correct the problem. For a listing and description of each 3390 media 
maintenance procedure, see “Error Handling for 3390” on page 102. The following 
sample Service Information Messages report shows a listing that includes SERVICE 
ALERTs and MEDIA ALERTs. 


GO 
G9 
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SERVICE INFORMATION MESSAGES (SIMS) REPORT DATE 097 89 
PERIOD FROM 090 89 
ry TO 093 89 
A 
COUNT FIRST OCCURRENCE LAST OCCURRENCE 
KEKKEKREKEKERERERRERERRERERERERERERERERERRRERRERRRERRRERERERERERREERRERRERERERRERERERRERERERERRERRRREREREREE 
1 090/89 00:31:05:73 090/89 00:31:05:73 
C E Gi 
MEDIA ALERT 3390-02 S/N 0113-14172 REFCODE 4340-8880-2283 ID=0A 
G 


TEMPORARY DATA CHECK(S) ON SSID 0040, VOLSER TSO8E2 DEV 08E2, 48 


PHYSICAL DEVICE 22, CYLINDER 022B TRACK 07 
if 
REFERENCE MEDIA MAINTENANCE PROCEDURE 3 


093/89 16:31:23:38 093/89 16:31:23:38 

MEDIA ALERT 3390-02 S/N 0113-14172 REFCODE 4380-E081-2585 ID=08 

TEMPORARY DATA CHECK(S) ON SSID 0040, VOLSER JES8E5 DEV 08E5, 59 
PHYSICAL DEVICE 25, CYLINDER 0669 TRACK 03 

REFERENCE MEDIA MAINTENANCE PROCEDURE 5 


090/89 07:33:46:47 090/89 07:33:46:47 
SERVICE ALERT 3390-02 S/N 0113-14172 REFCODE 43C0Q-8080-E602 ID=01 


J 
PERMANENT ERROR(S) ON SSID 0044, VOLSER JES226 DEV 0226, 66 
K } 


REPAIR WILL DISABLE PHYSICAL DEVICES 26-27 
090/89 01:11:07:95 090/89 Q1:11:07:95 
SERVICE ALERT 3390-01 S/N 0113-A8416 REFCODE £437-6011-0501 ID=10 


TEMPORARY ERROR(S) ON SSID 0044, VOLSER TSO0105 DEV 0105, 57 
REPAIR WILL DISABLE PHYSICAL DEVICES 04-05 


Figure 10. Service Information Messages Report Example 


Notes to Figure 10: 
Key Description (SIMs) 
Es Number of times subsystem reported this SIM. 


B | The time at which the SIM was first reported and the time at which it was last 
reported. 


This portion of the message tells you the type of SIM generated. In this case a 
MEDIA ALERT indicates that an error condition exists on the media and you 
need to perform a media maintenance action. A SERVICE ALERT indicates 
that the failure occurred on the hardware and you must contact your service 
representative. 


'D | Product type, model number, (in this case, a 3390 Model 2). The serial number 
is also listed. 


The reference code provides information (including PFU and FRU) for your IBM 
service representative to use to correct the problem. You will need to provide 
your service representative with REFCODE data for SERVICE ALERTs. 


The unique identifier for each SIM. Each time a new SIM is reported by the 
subsystem, it is assigned an ID number. 


Specifies the channel path ID (CHPID) 


| Appears for MEDIA ALERTs only. This line shows the physical ID of the 
device, as well as the track and cylinder address where the failure occurred. 
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Appears for MEDIA ALERTs only. This line tells you the appropriate Device 
Support Facilities procedure number to perform media maintenance. Fora 
description of 3390 media maintenance procedures, see Appendix B, “Specific - 
Guidelines by DASD Type” on page 81. 


This line provides information on the impact and the location of the failure. 


Appears for SERVICE ALERTs only. This line tells you the impact the repair 
action will have on the device(s) involved. 


Understanding the Asynchronous Notification Record Detail Report 


The Asynchronous Notification Record Detail report is useful when information 
regarding a specific SIM is needed. This report provides the same information that 
is in the Service Information Messages report. For a description of the fields, see 
“Interpreting the Service Information Messages Report” on page 33. 


REPORTING DEVICE: Q008F0 REPORT: ASYNCHRONOUS DAY YEAR 
REPORTING DEVICE TYPE: 3390 REPORTING SYSTEM: VS 2 REL. 3 370XA DATE: 319 89 
REPORTING PATH: 59-08F0 HH MM SS.TH 
TIME: 02 35 30.73 
RECORD TYPE: DASD SIM 
DEVICE DEPENDENT DATA 
SERVICE INFORMATION MESSAGE 


MEDIA ALERT 3390-02 S/N 0113-14172 REFCODE 41A0-0000-3083 ID=00 


PERMANENT DATA CHECK(S) ON SSID @022, VOLSER RAS8FO DEV O8FO, 59 
PHYSICAL DEVICE 30, CYLINDER 0000 HEAD 01 
REFERENCE MEDIA MAINTENANCE PROCEDURE 3 


HEX DUMP OF RECORD 
A3831810 90000000 0089319F 02353073 423B3826 30900000 
90000000 90000000 00000000 00000000 00000000 00000000 205908FO 80052027 
QO80008FO DICIEZF8 C6FOOOOO 00800500 3027EFOO 11030000 00308304 2360857C 
002241A0 05104600 FFQ00001 


Figure 11. Asynchronous Notification Record Detail Report Example 
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ce ee eee 


The Steps in Handling a SIM 


When handling a SIM, you should evaluate the condition in relation to the specific 
circumstances, such as the job in progress and the data in use at the time. For 
instance, the same condition that would cause you to take prompt action if it 
occurred in a catalog data set might be deferred (after investigation) if it occurred 
on a volume being used for temporary data. 


The exact procedures that you implement for handling and scheduling media 
maintenance activities must be tailored to suit the performance and availability 
requirements of your user community. It is a good idea to document your 
procedures for handling disk storage errors along with your other data processing 
operational procedures. 


The most effective control of operations can be achieved by reviewing SIM activity 
and performing media maintenance on a regular basis. This will help you maintain 
required levels of availability and performance for your data processing system. 
For most situations, your media maintenance actions with Device Support Facilities 
can be scheduled at a convenient time, or be automatically handled using the 
concurrent media maintenance capability, for minimal impact on system 
performance and availability. It is important to remember that concurrent media 
maintenance allows you to continue to access the data while using Device Support 
Facilities to perform maintenance on the track. For more information on concurrent 
media maintenance, see “Understanding Concurrent Media Maintenance” on 

page 26, and “Automating the Steps in Handling a SIM” on page 38, and “Using 
Concurrent Media Maintenance” on page 58. 


If your media maintenance actions do not correct the problem, call your IBM service 
representative for assistance. Have the associated EREP reports and Device 
Support Facilities output available for the service representative. 


The process outlined here consists of basic steps describing the tasks and tools 
involved in handling a SIM. These steps are shown in Figure 12. 


Figure 12. The Steps in Handling a SIM 


Detect the need for Console SIM Alert 


service 


lf the SIM appears as a DASD ALERT on 
the console or SERVICE ALERT on the 


Service Information Messages : ; 
report, call your service representative. 


report or Asynchronous 

Notification Record Detail If the SIM appears as a MEDIA ALERT 

report on the console or the report, perform 
step 2. 


Perform media Device Support Facilities, as See Appendix B, “Specific Guidelines 


maintenance 


specified by the media by DASD Type” on page 81 for a listing 

maintenance procedure of media maintenance procedures 

number listed in the SIM. performed using Device Support 
Facilities. 
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| Step 1: Identifying the Need for Service 

| Generating EREP reports daily should be part of your routine maintenance process. 
| lf an abnormal condition is reported by a user or noted from a system console 

| message, you may want to confirm the occurrence of the incident using the Service 
| Information Messages report or Asynchronous Notification Record Detail report. 


The severity field of the console SIM Alert tells you whether you can defer further 
handling of the error. For a complete description of SIM severity fields, see “SIM 
Severity Levels” on page 31. 


Use the console SIM Alert or the EREP reports to determine whether the failing 
component is the hardware (DASD ALERT or SERVICE ALERT) or if it is disk media 
(MEDIA ALERT). 


If the source of the error is hardware, the console DASD ALERT or the EREP reports 
will provide the information needed to place a service call. The EREP reports 
contain additional information about the impact the service action will have on your 
operation. Save the report for your own reference and for your IBM service 
representative to review. No further action is required. 


| If the source of the error is the disk media, the nature of the condition is determined 
| by the severity field in the console MEDIA ALERT or by the temporary or permanent 
| designation in the EREP reports. 


| A MEDIA ALERT provides the volume serial number (volser) and track address for 
| the failing volume. The EREP reports provide the similar information. Perform step 
| 2. 


Step 2: Perform Media Maintenance 

Each media SIM that is generated contains a media maintenance procedure 

| number. In the console MEDIA ALERT, the media maintenance procedure number 

| is shown as the last digit of the REFCODE. MEDIA ALERTS that are presented in the 

| EREP reports also provide you with the appropriate media maintenance procedure 
number. Each number corresponds to a specific Device Support Facilities command 
or set of commands to be run on a specific track or tracks. Appendix B, “Specific 
Guidelines by DASD Type” on page 81 lists each Device Support Facilities media 
maintenance procedure. 
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Automating the Steps in Handling a SIM 


It is possible to capture SIM information and automatically initiate tasks to assist 
you in handling the SIM. This capability allows you to create a series of commands 
that review SIM ALERTs sent to the operator console and take whatever action is 
appropriate for your installation. In an MVS environment, you can use the NetView’ 
facility to automate console message handling. In the VM environment, you can use 
the VM Programmable Operator (PROP) facility. 


In order to ensure console messages are created, the 3990 SIM Severity Reporting 
Option must be set to 0. For more information on how to set the SIM Severity 
Reporting Option, see the /BM 3990 Storage Control Planning Installation and 
Storage Administration Guide. 


Using NetView in an MVS Environment 


* 


NetView Release 3 runs on MVS as an automated operations facility. NetView 
automatically processes MVS console messages and executes EXECs, which in turn 
execute MVS commands. For example, you can create an EXEC that can examine a 
SIM and, in turn, invoke Device Support Facilities to perform media maintenance as 
required by the SIM. 


Refer to “Understanding Concurrent Media Maintenance” on page 26 and “Using 
Concurrent Media Maintenance” on page 58 for information on concurrent media 
maintenance. 


Note: You can use NetView to suppress console SIM Alerts. For more information 
on using NetView for tasks such as suppressing console messages, see: 


NetView Customization: Writing EXECs 


If you have NetView Release 3 installed, there are two steps you must follow to 
enable automated message handling. | 


1. Modify the NetView message automation table. Since the process is triggered 
by a console message, you must place the message ID in the automation table. 
For console SIM Alerts, the message ID is IEA480E. This will automatically 
invoke the EXEC that is specified in the automation table. 


The following example shows how NetView begins the automated message 
handling process. This entry will capture the MEDIA ALERT resulting in the 
execution of the CMDSF EXEC. 


NetView is a trademark of the International Business Machines Corporation. 
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2. Install the EXEC in your NetView EXEC library. The following sample EXEC is 
called CMDSF, but you can choose another name for it. Also, you may want to 
modify the EXEC to better suit your installation’s needs. For example, it might 
not be appropriate to automatically schedule media maintenance on all 
volumes. The increased I/O load caused by the media maintenance process 
may not be acceptable on page volumes, or certain database volumes. To 
exclude these volumes from the process, modify the EXEC. Another change you 
might want to make is to limit the automatic invocation to certain times of the 
day. 


A sample EXEC is shown in Figure 13. 


Note: The sample provided is intended to serve as a pattern. You should fully test 
the EXEC after inserting any installation dependent code. 


[> REXX KEKKEKEREKRERREKRERRRERRERRERERRREKERERRERERERRERREREREREERERREEEERE af 


/* EXEC Name - CMDSF +} 
/* if 
/* Descriptive name - Invokes ICKDSF INSPECT *y 
- my 
/* Function - This EXEC is used to run INSPECT using the */ 
f* Concurrent Media Maintenance facilities of sa 
hg ICKDSF. It is intended to be invoked ay 
/* automatically from the NetView facility. =. 
/* The MEDIA SIM console message is parsed into */ 
/* the necessary fields to build the necessary ad | 
/* control cards to submit ICKDSF. iat f 
ba a 
/* Scope: This EXEC is intended to serve as AN EXAMPLE of if 
/* how you can automate the initiation of a at 
’ iid Concurrent Media Maintenance task. It's purpose, */ 
/* therefore, is to serve as a pattern. It is still *} 
| al recommended that you fully test the EXEC after #7] 
/* inserting any installation dependent code. ia 
ig */ 
/* Notes - 1. This EXEC will address temporary errors reflected ay 
7 in media SIMs. sa | 
/* 2. This EXEC contains the pattern JOBCARD that will *} 
/* have to be adapted for your installation. These *] 
* three statements (JCL1, JCL2, JCL3) are found */ 
J* towards the end of this EXEC. “] 
j* 3. This EXEC may also be modified, for example, sf 
/® to not process certain volumes, or change the */ 
[* message routing. +) 
y ) 
/* Normal exit conditions - */ 
/* RC is the return code that gets passed back to the caller */ 
[* RC = 0 when the ICKDSF job was successfully submitted. Check */ 
j= the ICKDSF output for the return code from the ay 
j* INSPECT process. */ 
* * 


[EEEAAEEESELNE EERE ROSE ETN EEL ee Te ee eT ee ee Ne * 


Figure 13 (Part 1 of 2). Example of the MVS REXX EXEC 
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Address 'COMMAND' 


MsgText = MSGSTR() 


TypePos = POS('MEDIA' ,MsgText) 
If TypePos = 0 Then Exit 0 


‘SevPos = POS('SERVICE' ,MsgText) 
If SevPos = 0 Then Exit 0 


RefPos = POS('REFCODE' ,MsgText) 
If RefPos = 0 Then Exit 0 


If SUBSTR(MsgText ,RefPos+21,1) -= 9 Then 


If TypePos > SevPos Then Exit 0 


/* Retrieve message without id */ 


/* Look for MEDIA SIMs only */ 
/* Terminate if not MEDIA SIM */ 


_/* Look for temp errors only *] 


/* Terminate if not temps */ 
/* Look for the REFCODE. */ 
/* Terminate if none */ 


Exit 0 

/* Check Media Maintenance Proc 
Code - Terminate if not a 
correctable error */ 


/* The word MEDIA found */ 
/* somewhere else in message */ 


UnitAdd = SUBSTR(MsgText,TypePos-5,4) /* Get UNIT address */ 


VolPos = POS('VOLSER=' ,MsgText) 
If VolPos = 0 Then Exit 0 
VolPos = VolPos + 7 


/* Locate beginning of volser */ 
/* Terminate if no Volume in msg*/ 
/* Adjust to 1st pos of volser */ 


VolEnd = POS(',',MsgText, VolPos) /* Locate end of volser */ 
Volume = SUBSTR(MsgText,VolPos,VolEnd-VolPos) /* Extract Volser */ 
CHPos = POS('CCHH=',MsgText) /* Locate CCHH type in msg i 
If CHPos = 0 Then Exit 0 /* Terminate if no CCHH in msg */ 
CC = SUBSTR(MsgText, CHPos+7, 4) /* Get Cylinder */ 
HH = SUBSTR(MsgText, CHPos+12, 4) /* Get Track */ 


CCHH = "X'"CC"' Xx! "HHH Ee 


/* Form full Hexadecimal CCHH */ 


/* The following steps build the ICKDSF job with control statements, */ 
/* allocate the internal reader (INTRDR), and submit the job. *] 
/* Additional JCL changes may be required to suit your installation. */ 


/* The following 3 statements must be */ 
/* changed to reflect a valid JOBCARD */ 


/* in your installation. 


AS INDICATED IN MEDIA SIM ~ 
J 


[iia tee ee eee Beas */ 

JCL1 = '//*JOBCARD JOB (ACCOUNT INFO) ...' 

JCL2 = '//* 

UCL 3 = 1] SERRE EEE IOI II I II IRI IR IRIS TI TIER IRA 
JCL4 = '//* FUNCTION - RUN INSPECT ON CORRECTABLE DATA CHECK * 
JCL5 = '//* 

UCL6 = SARE EEE IRIE RE IRIE IR IR IIIT IRD 
JCL7 = '//DSFRUN EXEC PGM=ICKDSF' 

JCL8 = '//INSPDD DD  UNIT=SYSDA,VOL=SER='Volume'DISP=OLD' 
JCL9 = '//SYSPRINT DD SYSOUT=* ' 

JCL10 = '//SYSIN DD *' 

JCL11 = ' IODELAY SET MSECONDS(100) ' 

JCL12 = ' ANALYZE DDNAME(INSPDD) DRIVETEST NOSCAN ' 

JCL13 = ' IF LASTCC < 8 THEN - ' 

JCL14 = ' INSPECT DDNAME(INSPDD) NOVERIFY - ' 

JCL15 = '  TRACKS('CCHH') - ' 

JCL16 = ' SKIP CHECK(2) ASSIGN ' 


ADDRESS TSO 'FREE F(DD1)' 


ADDRESS TSO 'ALLOC F(DD1) SYSOUT(A) WRITER(INTRDR) RECFM(F) LRECL(80) ' 
ADDRESS MVS 'EXECIO 16 DISKW DD1 (STEM JCL' 


EXIT 


Figure 13 (Part 2 of 2). Example of the MVS REXX EXEC 
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Using the PROP Facility in VM 


VM/XA SP" and VM/SP HPO contain the PROP facility capable of intercepting 
messages, such as SIMs, and handling them with preprogrammed actions. For 
example, you can create a Restructured Extended Executor (REXX) EXEC which 
invokes EREP to generate the Service Information Messages report. 


Note: You can use PROP to suppress console SIM Alerts that are handled by the 
system-assisted facility. For more information on PROP and suppressing messages 
see: 


¢ VM/SP HPO System Programmer’s Guide 
¢ VM/SP HPO CP for System Programming 
¢ VM/XA SP Planning and Administration. 


To use the automated message handling capability in VM, you must change your 
Programmable Operator Message Table (RTABLE). This change, shown in 

Figure 14, causes the MEDIA ALERT to be intercepted. The REXX EXEC is invoked 
whenever this message is issued. This REXX EXEC is shown in Figure 15 on 

page 42. 


Figure 14. Modifications to the RTABLE for VM 


The REXX EXEC provided can be tailored to suit your installation’s needs. For 
example, you might want to modify the EXEC to specify your level of operating 
system. 


Note: The sample provided is intended to serve as a pattern. You should fully test 
the EXEC after inserting any installation dependent code. 


* 


VM/XA SP is a trademark of the International Business Machines Corporation. 
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f[ BERREEEETRERE REN ERERAE REA AEA TEESE TANASE TEER NRE RE SEERA AERO REE E SARE EE | 


/* EXEC name: EREPSIM 7 */ 


/* This sample REXX EXEC produces an EREP System Exception Report for */ 
/* use in real-time problem determination for DASD subsystems. EREP */ 
/* message data and the EREP Subsystem Exception Report will be sent */ 


/* to printer files. */ 
/* my 
/* NOTE1: There should be at least 7M of virtual storage. *J 
/* NOTE2: No tapes should be attached at 181 or 182. . */ 
/* NOTE3: The VM/SP HPO user must have a privilege class that allows */ 
/* the execution of the CPEREP module; the IBM supplied *) 
/* defaults would be privilege classes C, E, or F. */ 
/* NOTE4: The VM/XA user must have access to the minidisk containing */ 
/* the error recording file and have filemode X and virtual */ 
[* device address 195 available to LINK to it; a password may */ 
/* need to be supplied. : *] 


J PARRA EA APAAE ESE Re Re een T hee eRe a ee ea ae ee ee oY 


"GLOBAL TXTLIB ERPTFLIB ERFTRLIB EREPLIB' 

erep = 'CPEREP' 

parse value(diag(0)) with 1 sys 9. 

if sys = 'VM/XA SP' 

then do 

erep = 'CPEREPXA' 
'EXECIO © CP ( STRING LINK EREP 191 195 RR’ 
"ACCESS 195 X' 
'COPYFILE XAEREPIO RECORD X XAEREPIO RECORD Al ( REPLACE' 
"RELEASE 195 ( DET' 
end 


f PRAT EEEAEEA EEE SERA RESELLER ERARAN LS ROR E REET SOON EATS CR RNR ORY 


/* Step 1. create a working copy of SIM and OBR records =] 


LREEEREEEERAAEREREALE CREA EREREAAE AAS AREER EARLE AA EE LEER AT SRA OY, 


"ERASE WRKCOPY EREP' 
push; push 'HIST=N,ACC=Y,ZERO=N, TABSIZE=999K, TYPE=A0, PRINT=NO' 
'FILEDEF ACCDEV DISK WRKCOPY EREP ( RECFM VB BLKSIZE 12000' 
'FILEDEF TOURIST PRINTER ( BLKSIZE 133! 
erep 
if rc 7= 0 then exit rc 
"ESTATE WRKCOPY EREP A' 
if rc 7= 0 
then do; 
say 'EREP step 1 did not create a history file' 
exit rc 
end 


[EREEREREAELELEEARR LEAS LS LAEAAARE EE LENE RACAL AAR EN OREN A REN OLE RON OY 


/* Step 2. create the System Exception Report for DASD subsystems. */ 


[LEEREERERERERREERELERLERS EER EREEEERER AEE E EN EREEE REE ERA EERE RN ER ARETE | 


push; push 'HIST=Y,ACC=N,ZERO=N, TABSIZE=999K, DEV=(33XX) ,SYSEXN&Y ' 
"FILEDEF ACCIN DISK WRKCOPY EREP' 

'FILEDEF TOURIST PRINTER ( BLKSIZE 133' 

erep 

exit rc 


Figure 15. REXX EXEC Used for Automated Generation of EREP Reports in VM 
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Chapter 4. Performing Media Maintenance on Non-SIM DASD 


This chapter describes how to determine when media maintenance is necessary 
and what Device Support Facilities functions are required to perform it. Media 
maintenance techniques are centered around the use of the System Exception 
Reports produced by EREP. 


Before reading this chapter, you should have a basic understanding of error 
characteristics, how the storage subsystem and operating system contribute to the 
error handling process, and a basic understanding of EREP functions. This 
information is described in detail in “Chapter 1. Introduction” on page 15 and 
“Chapter 2. The Error Handling Process” on page 21 


Handling Errors on Non-SIM DASD 


First, you must establish that an error situation exists. There are several ways of 
discovering that an error has occurred: 


e Your regular review of System Exception Reports reveals temporary error 
situations requiring attention. 


¢ Messages displayed at a terminal or console provide immediate notification of 
permanent errors. 


e Users may notify you of a storage error that occurred during execution of an 
application program. 


The System Exception Reports produced by EREP are the most useful and complete 
means to verify that errors have occurred, and these reports are a necessary tool in 
following up on errors brought to your attention in other ways. 


Most permanent errors cause a console message to be generated. Console 
messages may not provide the information in a form you need to evaluate an error 
situation. 


Using Concurrent Media Maintenance on Non-SIM DASD 


It is important to remember that concurrent media maintenance allows you to 
continue to access the data while using Device Support Facilities to perform 
maintenance on the track. For more information on concurrent media maintenance, 
see “Understanding Concurrent Media Maintenance” on page 26 and “Using 
Concurrent Media Maintenance” on page 58. 


If your media maintenance actions do not correct the problem, call your IBM service 
representative for assistance. Have the associated EREP reports and Device 
Support Facilities output available for the service representative. 


Identifying the Error 
The means used for handling the problem depends on the source of the error, and 


the Subsystem Exception DASD report designates the source as a probable failing 
unit. 
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Console messages and application program facilities usually describe an error in 
terms of its type. The source of an error is a more reliable basis for recovery than 
the type of error because a type of error can have different sources. For instance, a 
data check type of error might be caused by a problem in the device hardware or by 
a defect on the disk surface. The source of an error points to where recovery action 
should be applied and to the means for recovery, as described in “The Steps in 
Handling Errors on Non-SIM DASD” on page 53. 


Console messages may go unnoticed because they require no response or operator 
intervention. You may obtain listings of all console messages that have occurred 
over a period of time or in connection with a specific processing job. In these 
listings, however, disk storage messages are intermixed with other messages and 
may not be easy to locate. This is why it is important to run EREP System Exception 
reports for more information. 


Reports of disk storage errors from users always warrant further investigation. It’s 
a good idea to request as much information as possible concerning the error and 
the conditions under which it occurred. If possible, obtain the job log from the 
processing step that incurred the error. System Exception Reports provide 
additional information on such errors. 


Using the System Exception Reports 


It is essential that you run all EREP reports that pertain to disk errors on a daily 
basis. The following sections present detailed information on reading and using the 
three System Exception Reports that are intended for DASD media maintenance. 


Although a variety of System Exception Reports are available through EREP, only 
three reports are applicable to disk media errors for non-SIM DASD. The other 
reports apply to other components, or are for use by IBM service representatives. 


The System Exception Reports produced by EREP will be most useful to you in 
handling data checks. (Data checks are errors detected when data is read.) The 
reports also provide information that will help you decide if and when to call for 
service. 


The reports for disk storage errors, discussed in this chapter, are listed here in the 
order in which you use them in identifying and handling a media error situation. 


System Error Summary (Part 2) lists incidents of permanent I/O errors (data or 
equipment checks) and identifies each error by job name and time. 


Subsystem Exception DASD provides information on accumulated permanent 
and temporary I/O errors. 


DASD Data Transfer Summary presents details on data checks. 
Your use of the reports will be more efficient if you locate volume information 
quickly. Figure 18 provides a “snapshot” of volume information locations for each 


type of report. The reports are numbered in the order you would use them for 
identifying and handling errors. 
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Samples of Volume Information 


SYSTEM ERROR SUMMARY REPORT DATE 225 88 
(PART 2) PERIOD FROM 224 88 
TO 225 88 


PHYSICAL PHYSICAL ERROR PROBABLE 
TIME JOBNAME CPU ID TYPE ADDRESS PATH VOLUME ERROR DESCRIPTION FAILING UNIT 


DATE 224/88 
19:21:13:42 TSODO2W A XX-84-05 3380 0335 04-0335 SYSTSL PERMANENT EQUIPMENT CHECK DEVICE 


DATE 225/88 
19:20:14:81 ICOTPSG 0D XX-QA-03 3380 07C3 50-07C3 PSGO91 PERMANENT DATA CHECK VOLUME 


Figure 16. How Volumes are Listed in the System Error Summary Report 


SUBSYSTEM EXCEPTION REPORT DATE 097 89 
DASD PERIOD FROM 052 89 
TO 052 89 


B-BUS OUT PARITY CHK C-CHECK DATA CHK OD-DISKETTE CHK I-INVOKED OFFSETS 
PROBABLE ---IMPACT OF TEMPORARY ERRORS---- 
FAILING FAILURE PHYSICAL EQU 
UNIT AFFECT CPU ADDRESS SIMS PERM TEMP CHK SKS RD OVRN OTHER 
HHRHKAIKIR IARI IKARIA IRIE IAN IIIA INARI ASAI ASIII IIASA SAARI IIASA ISIS ASAI ASAIN AI AIS ASIII AI ISIN AIA AI AIAN AIA AAI 
DEV Q1.X-17 SEEK TOTAL 2 1 1 

3380-JK 00 Q1.2-17 if 

00 Q01.2-17 2 


02.X-17 
3380-JK 


VOL  RASC77 DATAXFR 
3380-JK 


Figure 17. How Volumes are Listed in the DASD Subsystem Exception Report 


2 
DASD DATA TRANSFER SUMMARY REPORT DATE 225 88 
PROBABLE FAILING UNIT - VOLUME PERIOD FROM 224 88 
TO 225 88 


SENSE COUNTS 
TEMPORARY 
OFFSET INVK THRESHOLD 
PERM NO YES LOGGING 


RARER KEREREREREREEEEREEREREREEEREEREEEEREREREEEEREREREREREREREREREREEREREERERRKEREAE 


SEQUENCE BY VOLUME LABEL, HEAD, CYLINDER 


UNITADDRESS 07C3  DEVTYPE 3380 VOLUME PSG091 
CPU A PHYSICAL ADDRESS XX-0A-03 


FAILURE AT ADDRESS: CYLINDER 0341 HEAD 01 
08800000 83551141 01550001 16860B00 Q0000000 00114941 
LAST SENSE AT: 225/88 15:56:31:25 


Figure 18. How Volumes are Listed in the DASD Data Transfer Summary 


~ 
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Type of errors 


What is reported 


Probable failing units 


Figure 19 compares the contents of the three pertinent System Exception Reports. 
Detailed instructions for reading and using these reports are provided on the 
following pages. For the sample reports that are shown, report elements irrelevant 
to the handling of media errors are not discussed. See the EREP User’s Guide and 
Reference for complete information on defining report requests to meet your needs. 


System Error Summary Subsystem Exception DASD Data Transfer 

(Part 2) DASD ; Summary 

Permanent: Equipment and Permanent and temporary: Permanent and temporary: Data 

data checks Equipment and data checks checks 

Each incident (job and time) Accumulated total errors Details by volume (location of 
error) 

Hardware and volume Hardware and volume Volume 


Figure 19. Comparison Summary of Relevant System Exception Report Contents 


System Error Summary (Part 2) Report 


The System Error Summary (Part 2) report applies to disk and tape errors. (Part 1 
describes processor and channel checks.) The report lists each incident of a 
permanent //O error. The type of error might be a data check or an equipment 
check. The errors are in sequence according to the time they occurred. 


The System Error Summary report provides a quick perspective on all permanent 
[/O errors during the time period covered by the report and is the starting point for 
DASD error management. Figure 20 on page 47 shows an example of a System 
Error Summary report. 
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SYSTEM ERROR SUMMARY 
(PART 2) 


TIME 


DATE 052/89 
00:14:01:27 
00:14:58: 56 
00: 28:41:28 
00:41:00:47 


iC 


PHYSICAL PHYSICAL ERROR 


JOBNAME CPU 


EOS EXIT 
EOS EXIT 
EOS EXIT 
EOS EXIT 


ID 


REPORT DATE 097 89 
PERIOD FROM 052 89 
a oa q 052 89 
E F G 
PROBABLE 


I 
TYPE ADDRESS PATH VOLUME ERROR DESCRIPTION FAILING UNIT 


00 02.X-17  3380-JK QC77 4D-0C77 RASC77 PERMANENT SEEK CHECK DEVICE 
00 01.X-17 3380-JK QC77 5D-0C77 RASC77 PERMANENT SEEK CHECK DEVICE 
00 01.X-17  3380-JK 0C77 4C-0C77 RASC77 PERMANENT SEEK CHECK DEVICE 
00 02.X-17 3380-JK 0C77 4D-0C77 RASC77 PERMANENT SEEK CHECK DEVICE 


KERKEKREKRREKRERERERREREKREKRRREEREREREREREREREREEERKREREREREREREREREREEERERERREREREREREEREEREREREREEREREREREEREEREREERERERRERERERREREKRKE 


CPU MODEL 
00 3090XA 


SERTAL 
373826 


Figure 20. The System Error Summary Report 


Key Description 


TIME 


Dates and times the errors occurred. The time shown is for each permanent 
error. The first four numbers are the hours and minutes. The next four 
numbers are seconds and hundredths of seconds. 


JOBNAME 


Name of the job in progress when the permanent error occurred. The name 
can be as many as eight alphameric characters and is assigned by the 
programmer. 


CPU 
Numeric characters identifying the processor that received the error record. 


At the bottom of the report, the processor identifiers are given with their model 
and serial numbers. 


PHYSICAL ID 


A unique identifier for the device, set by the service representative at 
installation time. The format of this ID varies by device type. 


TYPE 
Device Type, for example 3380, in the JK model class. 


Note: The DASD reports, which include both permanent and temporary errors, 
might include other disk storage types that are not included in the System 
Error Summary (Part 2) if there are no permanent errors during this reporting 
period. 


PHYSICAL ADDRESS 
The unit address known to the operating system. 
ERROR PATH 


The address designation that identifies the components involved in the data 
transfer that incurred the error. This is the address actually used for selection, 
and is the address from which sense information was received. 


VOLUME 


The 6-digit volume serial number identifying the logical volume at the address 


that incurred the error. 
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ERROR DESCRIPTION 
Type of permanent error—either equipment check or data check. | ye 


This information is useful during Step 1 of the error handling process for oe 
permanent errors shown in Figure 23 on page 53. 


PROBABLE FAILING UNIT 


Probable error source determined by the EREP program. It could bea 
channel, storage control, controller, device, or volume. 


When the probable failing unit is a volume, the volume serial number on this 
report is used to continue the investigation of the error with the Subsystem 
Exception DASD and the DASD Data Transfer Summary reports. 


This information is useful during Step 1 of the error handling process for 
permanent errors shown in Figure 23 on page 53. 


DASD Subsystem Exception Report 
The Subsystem Exception DASD report lists accumulated permanent and temporary 
errors. The accumulated errors are given for each unit in the probable failing unit 
category. For example, each volume with errors is listed in the volume category. 
The accumulated total includes each permanent error in the System Error Summary 
(Part 2). Description of the type of error depends on the probable failing unit. 


_e Ifthe probable failing unit is a hardware component, permanent and temporary 
errors can be data or equipment type errors. 


e Ifthe probable failing unit is a volume, both permanent and temporary errors 
are always data type errors. 


The hardware probable failing units are listed first, and the volume probable failing 
units last. Figure 21 on page 49 shows a sample of this report. 


The DASD Subsystem Exception report highlights problems related to disk storage 
operation that may need further investigation and treatment. If the span of error 
records in the report covers more than three days, a message is printed at the top of 
the report. A report that spans a broad period of time might not provide the most 
accurate probable failing unit indication, because corrective action might have been 
taken. 


For certain device types, you establish the limit for the number of temporary data 
errors acceptable in your data processing complex. Probable failing units with 
temporary data errors below this limit are not printed. If there are units with errors 
not reported because the errors did not exceed the limit, a message gives the total 
number of such units. Limits can be set for each type of error, and for each storage 
control and disk storage type. You specify these with LIMIT control statements. 


Refer to EREP User’s Guide and Reference for instructions on specifying limits. If 


no limits are specified, all temporary errors are listed. It is recommended that you 
do not specify error limits. 
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For more recent DASD types (listed below), you do not specify temporary error 
limits because the subsystem controls this process: 


3370 attached to 3880 storage controls 
3375 
3380 
3390 
9332 
9335 


SUBSYSTEM EXCEPTION 
DASD 


B-BUS OUT PARITY CHK 


C-CHECK DATA CHK 


REPORT DATE 097 89 
PERIOD FROM 052 89 
TO 052 89 


D-DISKETTE CHK I-INVOKED OFFSETS 


8 | 
PROBABLE ---IMPACT OF TEMPORARY ERRORS---- 
FAILING FAILURE PHYSICAL = =—_ ----- TOTALS----- EQU 
UNIT AFFECT CPU ADDRESS SIMS PERM TEMP CHK SKS RD OVRN OTHER 


HKEKKKREKREEKEEREREERRERERREREREREREERERERERRERERRREREEEREEEKEREREEEERERRERERREREERERREREREEEEREREREREEERERERERRRERRERRERE 


DEV 01.X-17 SEEK TOTAL 2 1 1 
3380-JK 00 01.2-17 1 ] 
00 01.2-17 2 
Q2.X-17 SEEK TOTAL 2 
3380-JK 00 Q2.1-17 2 
one ne nn - 2 === ee ee ee ee a a eee 
VOL RASC77 DATAXFR TOTAL ] 1 
3380-JK 00 01.3-17 1 1-1 


KBRERERERERRREERRERREREREREREREREREREREREREREREREREREREEREREREREREERREREEREREREREREERRRERERERERERRERERERERERREREERERERREREERRE 


@ UNIT(S) EXCLUDED DUE TO LIMITS 


** ENTRIES WITH AN ASTERISK INDICATE THAT DASDID CARDS WERE NOT FOUND FOR THE UNIT. 


NOTE: "IMPACT OF TEMPORARY ERRORS" IS THE NUMBER OF TIMES ERROR THRESHOLD HAS BEEN EXCEEDED. 
NOTE: BLANK ENTRIES INDICATE ZERO VALUES OR NOT APPLICABLE. N/A = NOT AVAILABLE. 
NOTE: ZERO ENTRIES INDICATE RECORDS EXIST IN EREP REPORTS BUT THRESHOLDS WERE NOT EXCEEDED. 


Figure 21. DASD Subsystem Exception Report Example 


Key Description 
E§) PROBABLE FAILING UNIT 


Listed by probable failing unit categories: channel (CHAN), storage control 
(SCU), controller (CTLR), device (DEV), and volume (VOL). All units in each 
category that had errors are listed with probable failing. unit identifiers and 
device type number. When a physical ID identifies a probable failing unit, the 
physical ID represents a real physical ID set with switches, or a physical ID 
made up especially for the EREP program based on an address. 


This information is useful during Step 2 of the error handling process shown in 
Figure 23 on page 59. 


CPU 


As in the System Error Summary (Part 2), the numeric identifier identifies the 
processor that received the error records. At the bottom of the report, all 
processor alphabetic identifiers are given with their model and serial 
numbers. 


PHYSICAL ADDRESS 


An identifier of a serviceable unit. A four digit physical address, or a physical 
ID for units that have physical IDs. 
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Bf] --TOTALS-- 
SIMS PERM TEMP 


The SIM column is blank for DASD that do not produce SIMs. The numbers 
under PERM and TEMP indicate total permanent and total temporary errors for 
each unit. If the probable failing unit is a volume, the permanent and 
temporary error is always a data check on a read operation. 


{fs IMPACT OF TEMPORARY ERRORS 


The total shown under TEMP is the sum of the individual totals shown under 
IMPACT OF TEMPORARY ERRORS 


This information is useful during Step 7 of the error handling process for 
temporary errors shown in Figure 28 on page 58. 


A statement about unknowns may be printed after the regular listings. An unknown 
indicates that a probable failing unit could not be determined from the sense 
information. Also at the bottom of the report is the number of units excluded 
because of limits set by the processing complex on temporary errors reported. 


DASD Data Transfer Summary 
_ Although the DASD Data Transfer Summary Report has a Vo/ume and an Other 
section, only the Vo/ume section is discussed here. In the Volume section, all errors 
are data errors, and the volumes are the probable failing units. For these, you can 
use Device Support Facilities for recovery action. For errors in the Other section, 
you should contact an IBM service representative. 


The DASD Data Transfer Summary report provides details on data checks. All 
volumes listed in the DASD Subsystem Exception report (where they are listed with 
a total count of errors) are provided here with details of the data errors for that 
volume. Figure 22 on page 51 shows a sample report. 


Information is provided for all permanent data errors, because permanent errors 
are recorded in the ERDS. For temporary errors, the information is provided if the 
error description (not just the count) is logged in the ERDS. Whether a temporary 
error is logged depends on the disk storage device type, and the area in which the 
error occurred. 
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DASD DATA TRANSFER SUMMARY REPORT DATE 225 88 
PROBABLE FAILING UNIT - VOLUME PERIOD FROM 224 88 
TO a 88 
E 
SENSE COUNTS 
TEMPORARY Gi 
fi} OFFSET INVK THRESHOLD 
PERM NO YES LOGGING 


KRKKEEKRERERKERERRRRERRERERERERREERRERERREREREREREREEREKRRRRERRERREREERRERERERERERERERREREREE 


SEQUENCE BY VOLUME LABEL, HEAD, CYLINDER 


UNITADDRESS 07CC DEVTYPE 3380 VOLUME ASCR84 
CPU A PHYSICAL ADDRESS XX-QA-OC A | 


FAILURE AT ADDRESS: CYLINDER 0708 HEAD 00 
00001000 8CC42053 02C40000 0B700B00 1100057A 60000000 
Gi LAST SENSE AT: 224/88 14:32:58:32 


FAILURE AT ADDRESS: CYLINDER 0004 HEAD 03 
00001000 8C040353 00040003 02700B00 11004762 00040000 
LAST SENSE AT: 225/88 23:50:36:53 


FAILURE AT ADDRESS: CYLINDER 0083 HEAD 07 
00003000 8C530743 00530007 045E0B51 00000000 00114943 
LAST SENSE AT: 225/88 11:58:18:85 


UNITADDRESS 0B54 DEVTYPE 3350 VOLUME ICOM15 
CPU B PHYSICAL ADDRESS 0B54 


FAILURE AT ADDRESS: CYLINDER 0003 HEAD 08 
08004000 08030853 00030008 02210000 12000496 00040000 
LAST SENSE AT: 224/88 09:51:38:40 


FAILURE AT ADDRESS: CYLINDER 0018 HEAD 08 
08004000 08120853 00120008 02210000 121401AC 00200000 
LAST SENSE AT: 225/88 22:52:06:17 


UNITADDRESS 07€3 DEVTYPE 3380 VOLUME PSG091 
CPU A PHYSICAL ADDRESS XX-0A-03 


FAILURE AT ADDRESS: CYLINDER 0341 HEAD 01 
08800000 83551141 01550001 16860B00 90000000 00114941 
LAST SENSE AT: 224/88 15:56:31:25 


FAILURE AT ADDRESS: CYLINDER 0571 HEAD 03 
00003000 833B2343 023B0003 01050B11 00000000 00114943 
LAST SENSE AT: 225/88 11:24:01:79 


Figure 22. DASD Data Transfer Summary Example 


Key Description 


UNITADDRESS xxxx DEVTYPE xxxx-xx VOLUME xxxxxx 
CPU x PHYSICAL ADDRESS xxxx 


Detailed error information is grouped by volume. These lines serve as a 
“header” for the error details for a volume and specify the unit address used to 
select the volume, the type of device (for example, 3380), and the volume 

serial number. In addition, the processor designation and the physical 

address for the failing volume is provided. 
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B | FAILURE AT ADDRESS: CYLINDER xxxx HEAD xx 


This is the track address where the error occurred. For count-key-data types, 
addresses are cylinder and head numbers. For fixed block architecture, the 
designation is FAILURE AT BLOCK and the address is expressed as a relative 
block number or as cylinder, head, and sector numbers. Addresses on this 
line are expressed as decimal values. 


This information is useful during Step 3 of the error handling process shown in 
Figure 23 on page 59. 


Sense Information 


This is the sense record received from the subsystem. It is either 24 or 32 
hexadecimal bytes of sense data. If more than one error is reported for an 
address, the sense information applies to the last error. The sense 

information is primarily for use by the service representative. However, you 
need the information in bytes 22 and 23 for interpreting certain error conditions 
described in Appendix B, “Specific Guidelines by DASD Type” on page 81. 


Byte 0 Bytes 22 & 23 


XXXXXXAX XXXAXXXK AXAKXKAK XXXKXKAX KIA KXAN (XXXXXXXX XXXXXXXX) 
Below each sense record is the date and time of this last sense record. 

BP} PERM 
Error counts for permanent data checks are shown. 


fa SENSE COUNTS 
TEMPORARY 


Error counts for temporary data checks are shown to the right of permanent 
error counts. There are two Offset Invoked columns for temporary errors. No 
or Yes values for temporary errors are interpreted as follows: 


e For the 3330, 3340, 3350, and 3370s attached directly to the 4321, 4331, or 
4361, the values under “Temporary” are all logged temporary data errors. 
The value is always listed under “Offset Invoked, No.” (Zero always 
appears under Threshold Logging.) 


e For the 3375, 3380, and those 3370s not attached directly to the 4321, 4331, 
or 4361, the value under “Offset Invoked, No” is the number of times the 
data error rate threshold for the volume was exceeded. 


The value under “Offset Invoked, Yes” is the actual number of errors that 
were logged with offset. See “Error Handling for 3375 and 3380” on 
page 96 for further information on “Offset Invoked, Yes” 


THRESHOLD LOGGING 


This information is related to temporary errors and does not apply to the 3330, 
3340, 3350, or 3370 attached directly to 4321, 4331, or 4361. This value is the 
number of errors at that cylinder and head address during the error reporting 
interval. Other volumes on the same string may show values in this column, 
even if there are no values under the other error columns. This is because al/ 
volumes on the string may be placed in logging mode when any volume 
causes logging mode to begin. 


This information is useful during Step 3 of the error handling process shown in 
Figure 23 on page 59. 
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It is possible for 3330, 3350, and 3370 disks (not attached by way of the 3880 control 
unit) to have temporary data errors that were not logged. There are no cylinder and 
head numbers available for these errors. In such cases, volumes involved are 
listed after the volume listings for which cylinder and head numbers are available. 
These errors are included in the error count in the Subsystem Exception DASD 
report. See “Chapter 5. Device Support Facilities” on page 57 for information on 
determining the addresses for temporary errors at unknown addresses. 


The Steps in Handling Errors on Non-SIM DASD 


The error handling process outlined here consists of basic steps that show each 
error handling task and the tool(s) involved to complete it. 


In following these procedures, you should evaluate the error situation in relation to 
the specific circumstances, such as the job in progress and the data in use at the 
time. For instance, the same error that would cause you to take prompt action if it 
occurred in a catalog data set might be disregarded (after investigation) if it 
occurred on a volume being used for temporary data. 


The exact procedures that you implement for handling and scheduling error 
recovery activities must be tailored to suit the performance and availability 
requirements of your user community. It is a good idea to document your 
procedures for handling disk storage errors along with your other data processing 
operational procedures. 


The most effective control of operations can be achieved by reviewing error 
information and performing recovery on a regular basis, so that error situations do 
not accumulate. For most situations, your recovery actions with Device Support 
Facilities can be scheduled at a convenient time for minimal impact on system 
performance and availability. It is important to remember that concurrent media 
maintenance allows you to continue to access the data while using Device Support 
Facilities to perform maintenance on the track. For more information on concurrent 
media maintenance see “Understanding Concurrent Media Maintenance” on 

page 26 and “Using Concurrent Media Maintenance” on page 58. 


lf your recovery actions do not correct the problem, call your IBM service 
representative for assistance. Have the following material available for the service 
representative: 


e System Exception Reports obtained prior to recovery actions 
¢ Device Support Facilities output 
e System Exception Reports obtained following recovery actions. 


Figure 23 (Page 71 of 2). The Steps in Error Handling 


Detect error occurrence System Error Summary (Part If permanent error exists, perform 
2) report (permanent errors) Step 2. 


DASD Subsystem Exception If temporary errors require 


report (temporary errors) investigation, perform Step 2. 


Determine source of DASD Subsystem Exception If the source of error is hardware, call 
errors using EREP report your service representative. 


If the source of error is volume, perform 
Step 3. 
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Figure 23 (Page 2 of 2). The Steps in Error Handling 


Determine location and DASD Data Transfer Summary If location information is complete, 
nature of error using report perform Step 5. 


EREP. If location information is not complete, 


perform Step 4. 


Supplement information Device Support Facilities Perform Step 5. 
on location and nature ANALYZE DRIVETEST SCAN 
of error using Device 

| Support Facilities 


Perform media Device Support Facilities as Follow through as specified in 
maintenance. specified in Appendix B, Appendix B, “Specific Guidelines by 
“Specific Guidelines by DASD DASD Type.” 
Type” on page 81 


Step 1: Detecting the Error 
Generating System Exception Reports daily should be part of your routine 
maintenance process. Even if an error situation is reported by a user or noted from 
a system console message, you need to confirm the occurrence of a permanent or 
temporary error. For permanent errors, review the System Error Summary (Part 2). 
Use the Subsystem Exception DASD report to determine number and frequency of 
both permanent and temporary errors. 


Depending on the recoverability status of the error (permanent or temporary) and 
the repeatability of the error, you can choose to defer further handling. The 
seriousness of temporary data errors is related to their number, frequency, and 
concentration in specific locations. The Subsystem Exception DASD report specifies 
the total number of temporary and permanent errors 


Note: Permanent data errors with a probable failing unit of VOLUME require 
immediate investigation. 


Step 2: Determining the Error Source with the Subsystem Exception Report 
Use the DASD Subsystem Exception report to determine whether the probable 
failing unit is the hardware (listed as CHANNEL, STORAGE CONTROL, 
CONTROLLER, and DEVICE) or if it is disk media (listed as VOLUME). 


If the source of the error is hardware, save the DASD Subsystem Exception report 
output and contact your IBM service representative. 


If the source of the error is VOLUME, determine the location and nature of the error 
(Step 3). 


In your review of the DASD Subsystem Exception report, you can determine the total 
permanent and temporary errors for each volume ID. When a volume probable 
failing unit has temporary errors, you must decide whether the number of temporary 
errors is within acceptable limits for your processing complex. 


itis important to consider the DASD type when looking at numbers of temporary 
errors: 


e For the 3330, 3340, 3344, 3350, and those 3370s that are attached directly to a 
4321, 4331, or 4361, the value given in the EREP report description for total 
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temporary errors (as many as 9999) is the actual number of temporary errors 
that occurred for that volume. 


e For the 3375, 3380, 9332, 9335, and those 3370s that are not attached directly toa 
4321, 4331, or 4861 the value given in the EREP report description in the 
temporary column indicates the number of times the subsystem reported 
temporary errors. 


When a temporary data check occurs, the affected data is available for processing, 
so the operation in progress continues without interruption. At issue is whether the 
time and resources required for the subsystem and system to execute error 
recovery procedures is impacting performance. In general, the impact of the 
temporary error recovery process is not regarded as significant enough to take time 
and system resources required for immediate media maintenance— unless the 
frequency of temporary data checks is excessive or they occur repeatedly with 
frequently used data. 


Step 3: Determining Location and Nature of Errors with the DASD Data Transfer 
Summary | 
Use the DASD Data Transfer Summary report to determine the track or block 
addresses where data errors occurred on a given volume. 


The DASD Data Transfer Summary contains error information for a specific time 
period. The amount of time that elapses between EREP report executions can affect 
the scope of the data included. Examining DASD Data Transfer Summary reports 
that collectively span a greater time period might provide a perspective that is not 
apparent from a single report. 


The information on the DASD Data Transfer Summary might not always provide 
sufficient information because the application program has not referred to tracks 
that might need to be read to provide complete failure source analysis. Use the 
Device Support Facilities ANALYZE SCAN command (Step 4) to get more complete 
information for evaluating error occurrences, and to supplement DASD Data 
Transfer Summary data in the following situations: 


¢ Permanent data checks have been reported. 


e The data checks appear to be following a pattern; for example, involvement of a 
single head or a small range of cylinders. 


e The report does not identify locations for temporary errors (applicable only to 
certain disk storage types). 


If the location information on the DASD Data Transfer Summary is complete and 
none of the conditions above exists, perform media maintenance (Step 5). 


Step 4: Supplementing Error Information with the ANALYZE Command 
Use the ANALYZE command of the Device Support Facilities, specifying DRIVETEST 
(for nonremovable media DASD) and, optionally, SCAN to obtain additional error 
information. It is not always necessary that ANALYZE SCAN operate on the entire 
volume: 


¢ For a permanent data check, limit ANALYZE SCAN to a range of 10 cylinders 
that starts 5 tracks before the track with the error and ends 5 tracks beyond the 
track with the error. For example, if the error is on cylinder 55, head 6, then run. 
ANALYZE SCAN from cylinder 50 to cylinder 60. 


e For errors clustering around a single head, limit the scan to cylinders for that 
head only. 
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ANALYZE reports the tracks on which data checks are detected. Use the error 
information reported by ANALYZE, along with the information on the DASD Data 
Transfer summary, in performing media maintenance (Step 5). 


If ANALYZE reports a suspected drive problem, save the ANALYZE output and call 
your IBM service representative. 


Step 5: Perform Media Maintenance 


Specific error recovery instructions are provided in Appendix B, “Specific | 
Guidelines by DASD Type” on page 81 for using Device Support Facilities for each | 
DASD type. 


To use these guidelines effectively, you first need to establish the category of error 
condition that exists. The error conditions to be treated are: permanent versus 
temporary; the number of errors on the volume; and whether cylinder/head 
addresses are known or unknown. 


1. Using the volume serial number obtained from the Subsystem Exception DASD 
report, find the same volume serial number on the DASD Data Transfer 
Summary report. Then, confirm that the type and physical address are the 
same as those for the volume in the DASD Subsystem Exception report. 


2. Determine how many times FAILURE AT ADDRESS appears for that volume. 
This count gives you the number of track or block addresses with data errors on 
that volume. You need to perform media maintenance on all or part of this ae 
of tracks or blocks. This varies by DASD type. 


3. For each track or block address, check to see if the data error at that address is 
permanent or temporary. 


4. For temporary errors with no cylinder and head specifications, you can | 
determine the number of errors by referring to the DASD Subsystem Exception 
report. 


Additional Activities 


After performing the recovery actions specified in Step 5 in the error handling 
process, review the DASD Data Transfer Summary report on subsequent days to 
verify that the tracks or blocks that received maintenance do not have recurring 
errors. 


Remember, if your recovery actions have not corrected the problem, call your IBM 
service representative for assistance and have the following material available: 


e System Exception Reports obtained prior to recovery actions 
¢ Device Support Facilities output 
e System Exception Reports obtained following recovery actions. 


Maintaining IBM Storage Subsystem Media 


Chapter 5. Device Support Facilities 


This chapter describes some functions and commands of Device Support Facilities 
(often referred to as ICKDSF), which are fundamental tools in diagnosing and 
handling DASD media errors. 


Device Support Facilities is an IBM program provided for use by customers to 
perform volume formatting, or initialization and disk surface (media) maintenance 
functions. This chapter guides you in using disk surface maintenance commands 
and functions effectively. Given the versatility of Device Support Facilities and the 
fact that it operates on all models of IBM DASD, there are many combinations of 
commands and parameters that are device specific. Refer to Device Support 
Facilities User's Guide and Reference for complete information. For specific 
considerations for IBM 3380 and 3390 DASD, see Device Support Facilities: Primer 
for the User of IBM Direct Access Storage. 


Device Support Facilities Release Level 


The material presented in this chapter assumes that you are using Device 


Support Facilities Release 11.0 or higher. 


Using the information in this chapter requires an understanding of DASD physical 
characteristics which are discussed in “Characteristics of an Error” on page 15. 


Overview of Media Maintenance 


Device Support Facilities consists of various commands, with multiple parameters 
associated with each command to tailor it for a specific operation. Various media 
maintenance functions associated with the following commands are discussed here: 


ANALYZE Used to perform hardware and data verification tests, and to report 
the location of potential error sites. 

INSPECT Used for checking the media surface to detect and bypass potential 
error sites. 

INIT Used for rewriting home addresses (HA), record zeros (RO), and 


volume label on the volume. 


INSTALL Used to perform tasks necessary for installation, head-disk 
assembly (HDA) replacement, and physical movement of IBM 3380 
and 3390 direct access storage devices. 


Also used for conversion of 3390 models from “3390 mode” to “3380 
track compatibility mode,” or vice versa. See “Using INSTALL to 
Change Modes on a 3390” on page 68. 


| REVALIDATE Combines track validation, problem determination, and data 
| verification functions, and surface checking functions, if required. 


Only applicable parameters of each of these commands are discussed in this 
chapter. Defaults are mentioned only when relevant. Guidelines for command use 
in performing media maintenance for specific error conditions are provided by 
DASD type in Appendix B, “Specific Guidelines by DASD Type” on page 81. 
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Device Support Facilities operates on one device at a time. However, more than 
one copy of Device Support Facilities can be used simultaneously to operate on 
other devices. All commands require that a device be specified. This is done with 
the UNIT, DDNAME, or SYSNAME parameter, depending upon your environment and 
the disposition of the DASD. Device specification is not included in the discussions 
that follow. 


Commands can be conditionally combined into one Device Support Facilities — 
invocation by use of the IF-THEN-ELSE sequence of commands. As an example, you 
can run an INIT (initialize) and follow it with an ANALYZE only if the INIT is 
successful, as follows: 


INIT UNIT(ccuu) NOVFY VALIDATE DATA VOLID(volser) 
IF LASTCC < 8 THEN - 
ANALYZE UNIT(ccuu) DRIVETEST SCAN 


See Device Support Facilities User’s Guide and Reference for details on command 
syntax. 


Establishing Guidelines for Performing Media Maintenance 

It is important that you take note of devices at your data processing center that have 
critical data and applications before scheduling media maintenance. You might not 
want to any schedule media maintenance procedures at a peak period of your 

_ business day or you might choose to defer maintenance on specific volumes. Itis a 
good idea to develop a strategy for performing media maintenance to allow for | 
minimal interruption to your daily routine. For more information on identifying 
critical data on your DASD, see the appropriate Storage Subsystem Library 
operating environment manual listed in the “Bibliography” on page 121. 


Using Concurrent Media Maintenance 


Concurrent media maintenance provides you with the ability to access your data 
while Device Support Facilities performs the surface checking function on the track 
in error. If you are running INSPECT on a 3380 or 3390 DASD attached to a 3990 
Model 2 or 3 with the correct level of microcode, the surface checking function is 
accomplished by temporarily moving the data to an alternate track and directing I/O 
to that track during the Device Support Facilities operation on the original track. 
The data is returned to the original track when the operation completes. If the 
original track is defective, the data will remain on the alternate track and will not be 
restored to the original track. 


Concurrent media maintenance, during the INSPECT command, minimizes the 
disruption to your operation and eliminates the need for your data to be unavailable 
due to media maintenance actions. 


It is possible to further minimize the impact of performing media maintenance on 
system performance by using the INSPECT command (with concurrent media 
maintenance functions) in conjunction with the IODELAY command. The purpose of 
the IODELAY command is to “slow down” INSPECT processing by allowing a delay 
time in between I/O processing rather than issuing consecutive I/O operations. This 
will increase the run time of the INSPECT operation. 


For more information on using the concurrent media maintenance function, and the 
IODELAY command, see the Device Support Facilities User’s Guide and Reference. Bee 
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You should review current guidelines you’ve established for performing media 
maintenance and update them to take advantage of the concurrent media 
maintenance capability. Performing media maintenance on high activity volumes 
can impact system performance, so you should consider this when scheduling 
maintenance tasks. 


Performing Media Maintenance on Dual Copy Volumes 


With ICKDSF Release 12 and the correct level of microcode installed on the 3990, 
the ANALYZE and INSPECT commands allow processing on a volume in duplex 
state or suspended duplex state. When performing media maintenance on a volume 
that is part of a dual copy pair, the user may first need to determine whether it is the 
primary volume or secondary volume that requires media maintenance. 


By using the DIRECTIO parameter with the ANALYZE or INSPECT commands, 
Device Support Facilities provides a means for I/O to be directed to either the 
primary volume or the secondary volume without having to put the volumes into 
simplex mode. 


For more information on command parameters and defaults, and dual copy support, 
see Device Support Facilities User’s Guide and Reference. For additional 
information on dual copy support, see /BM 3990 Planning, Installation and Storage 
Administration Guide. 


| Performing Media Maintenance in VM 

| VM users should be aware that some Device Support Facilities commands and/or 

| parameters are valid only for devices that are dedicated to a a virtual machine. 

| However, VM operating systems with CMS minidisk support for media maintenance 
| (DIAGNOSE code X'E4' with subcodes hex.02' and X'03') provide additional 

i capabilities. On these systems the CMS command form of ICKDSF can perform 

| ANALYZE, without DRIVETEST, and INSPECT functions against a minidisk for which 
| there are no outstanding write links. Users may continue to access other minidisks 
| on the volume concurrent with the media maintenance procedure. 


| For more information on using Device Support Facilities in a VM environment, see 
| Device Support Facilities User’s Guide and Reference. 


The ANALYZE Command 


The ANALYZE command is used to examine a device and/or the data on a volume to 
help determine the existence and the nature of errors. There are two parts to the 
command: the drive test, for nonremovable media only, (invoked by the DRIVETEST 
parameter) and the scan (invoked by the SCAN parameter). DRIVETEST and SCAN 
can be invoked independently or together. 


The drive test applies only to nonremovable media. DRIVETEST performs 
fundamental tests to ensure that device hardware can perform basic operations, 
such as seeks, reads, and writes. DRIVETEST is not disruptive to user operation. 
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ANALYZE on Dual Copy Volumes 


When using the ANALYZE command to process dual copy volumes. in duplex state 
or in suspended duplex state, you can use the DIRECTIO(PRIMARY) or » 
DIRECTIO(SECONDARY) parameters to select either the primary or secondary 
volume. The UNITADDRESS parameter or DDNAME parameter must always point to 
the primary volume. See Device Support Facilities User's Guide and Reference for 
additional information, restrictions, and defaults when processing dual copy 
volumes. 


How 10 Use ANALYZE DRIVETEST 
~ You invoke DRIVETEST by specifying: 


ANALYZE UNIT(ccuu) DRIVETEST 


You can also use ANALYZE DRIVETEST to identify the path to the device where an. 
error occurred. If you specify the path control parameter, you can direct Device 
Support Facilities to process the drive test down every channel path, or limit 
processing to a specific path(s). Path control is available for any DASD that 
attaches to the 3990. For more information on path control see the Device Support 
Facilities User’s Guide and Reference. 


DRIVETEST parameter is the default when the ANALYZE command is invoked. 


Note: Other parameters (not shown here) are available for this command. For 
more information on command parameters, see the Device Support Facilities User’s 
Guide and Reference. | 


What to Expect from ANALYZE DRIVETEST 


A specific invocation of DRIVETEST produces one of two basic messages. One 
indicates that no drive problems were found. The other indicates a Suspected Drive 
Problem. | 


A Suspected Drive Problem message means that an error condition has been 
detected and that you need to call an IBM service representative. 


Additional information contained in the ANALYZE output is provided to supplement 
the information used in the problem determination process. It is important to save 
this output and provide it to the service representative to aid in resolving the 
problem. No data is recorded in the ERDS during ANALYZE DRIVETEST processing. 


Your service representative might ask for additional Device Support Facilities 
functions to be run as part of the repair process. 


How ANALYZE SCAN Works 


SCAN reads data that currently exists on a volume. If SCAN reads the data 
successfully the first time, no further rereading of the track takes place. Ifa data 
check is detected on the first read, further reads of the data are issued to establish 
that the data check is not a random occurrence and that a message should be 
reported for the track. 


Data is read with subsystem and error recovery processes disabled to allow SCAN 
to identify all data checks. Data is never recorded in the ERDS during ANALYZE 
SCAN processing. SCAN has no affect on user data on the volume. 


60 Maintaining IBM Storage Subsystem Media 


How to Use ANALYZE SCAN 
SCAN is invoked by specifying: 


ANALYZE UNIT(ccuu) SCAN 
The SCAN parameter initiates the scan function and must be specified. 


Defaults and Optional Parameters: 


e Limits can be placed on the area of data that is scanned by using the range 
parameters (TORANGE, FROMRANGE, CYLRANGE, HEADRANGE, LIMITS). 
ALL is the default limit parameter if SCAN is specified, as in the sample 
provided. 


¢ SPEED can be specified to read an entire cylinder with each I/O. Caution is 
advised when using the SPEED option when other activity is heavy on the 
channel, because performance on the channel could be degraded. NOSPEED, 
the default, reads one track at a time. Because SPEED was not specified in the 
syntax shown, NOSPEED is used. 


e The sample invocation provided for ANALYZE SCAN will first perform the drive 
test function, because DRIVETEST is also defaulted. The drive test can be 
bypassed by specifying NODRIVETEST. 


Note: Other parameters (not shown here) are available for this command. For 
more information on command parameters, see the Device Support Facilities User's 
Guide and Reference. 


What to Expect from ANALYZE SCAN 


SCAN presents messages to indicate that the error is either correctable 
(ECC-correctable) or uncorrectable (ECC-uncorrectable). SCAN cannot distinguish 
between temporary and permanent data checks because it operates with all levels 
of recovery disabled. An uncorrectable data check message is not an indication that 
the error would be permanent to the application when the data is accessed during 
normal processing and recovery procedures. For example, the data might be 
readable using the offset recovery process. 


| ANALYZE SCAN provides specific track messages. In addition, a table indicating 
| which heads have experienced an error is presented. 


Notes on Invocations of SCAN 


A data check must be detected the first time a track is read in order for further 
analysis of the track to take place. Any low repeatability data check can be 
detected on any given SCAN. Therefore, the following conditions can occur: 


¢ Multiple runs of SCAN can produce messages regarding different tracks. 
(Data checks occurring as a result of additional runs are likely to be low 
repeatability data checks.) 


¢ Tracks that are known to have experienced data checks might not be 
reported by ANALYZE SCAN. 


ANALYZE SCAN contains logic to monitor data check information, and can produce 
a Suspected Drive Problem message when needed. It is important to save all output 
and provide it to the IBM service representative to aid in resolving the problem. 
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The INSPECT Command 


The INSPECT command is used for two primary purposes: ~ 


e Surface checking a track 


-@ Rewriting data 


If you are invoking INSPECT on a device not attached to a 3990 Model 2 or 3, 
customer workload scheduling must allow Device Support Facilities exclusive 
control of the track being processed. For DASD attached to the 3990 Model 2 or 3 
with the correct level of microcode installed, you have the advantage of using 
concurrent media maintenance. 


The INSPECT command uses data protection mechanisms available for each 
operating environment, and it locks out other processors if the DASD is shared. 
However, the processor affected by the INSPECT command can continue to access 
all tracks on the device. 


You must specify either VERIFY or NOVERIFY for every invocation of the INSPECT | 
command. VERIFY provides additional control to ensure that the existing volume 
serial number and/or owner identification on the volume are correct. NOVERIFY 
bypasses this checking, and is used in all sample invocations of INSPECT shown 
here. 


INSPECT on Dual Copy Volumes | 


The DIRECTIO(PRIMARY) and DIRECTIO(SECONDARY) parameters are provided to 
allow processing on dual copy volumes in duplex state or suspended duplex state. 
When in duplex state, only DIRECTIO(PRIMARY) is allowed, which will select the 
primary volume to be processed. When in suspended duplex state, either 
DIRECTIO(PRIMARY) or DIRECTIO(SECONDARY) are allowed to select either the 
primary or secondary volume to be processed. 


Two copies of each track exist and, throughout the media maintenance process, the 
data is preserved. However, to ensure proper execution, the NOPRESERVE 
parameter must always be used when processing the secondary volume. The 
PRESERVE parameter must always be used when processing the primary volume. a 
If errors are present using PRESERVE on the volume that is the active primary, then | 
the dual copy pair can be split and NOPRESERVE can be used on the volume in 
_ simplex state. If duplexing has been suspended, when it is made active again, the 
data will be preserved from the good copy on the primary. 


The UNITADDRESS parameter or DDNAME parameter must always point to the 
primary volume. See Device Support Facilities User’s Guide and Reference for 
additional information, restrictions, and defaults when processing dual copy 
volumes. 


How INSPECT Works for Surface Checking 


Surface checking is performed to determine the repeatability and visibility of data 
checks. The degree of surface checking performed is determined by the input 
parameters and the characteristics of the specific device type. 


In order to reliably locate low-repeatability, low-visibility data checks, INSPECT 
performs multiple write operations on a given track or block. 
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Notes on using INSPECT with Concurrent Media Maintenance 
If INSPECT is processing using concurrent media maintenance functions and 
terminates before the task is completed, you should note the following: 


e Your data remains on the alternate track and can be accessed from there until a 
subsequent INSPECT is run for that device. 


¢ If a subsequent INSPECT is started from the same processor, processing 
continues for the track that originally received the failure. 


e If asubsequent INSPECT is started from a different processor (in a shared 
environment), use the FORCE parameter to support recovery of a prior 
concurrent media maintenance failure from another processor. This is 
necessary to prevent multiple INSPECT jobs from different processors 
accessing the same track simultaneously. 


CAUTION: Only use the FORCE parameter to recover from a prior concurrent 
media maintenance failure on another processor. Misuse of this parameter can 
cause data integrity problems when two INSPECT jobs are running 
simultaneously. 


Count, Key, Data (CKD) Devices 
This section describes how surface checking works for devices that use the 
count-key-data format. 


3340, 3344, 3350, 3375, 3380, 3390: Most media related data checks are the result of 
a small defective area on the surface of a track. This area can be skipped over by 
the DASD subsystem, and is referred to as a skip displacement. When a skip 
displacement is assigned to a track, data written on that track straddies the 
displaced location. Area reserved at the end of the track is used to replace the 
skipped area, and the track capacity remains constant. 


Device Support Facilities surface checking is invoked by executing INSPECT, and is 
designed to detect all error sites that might produce uncorrectable or correctable 
data checks when user data is stored on the track. The maximum number of skip 
displacements that a track can have varies by device type: 


Device Skip Displacements 


3340 
3344 
3350 
3375 
3380 
3390 


NNN WOR eR 


lf more than the maximum number of skip displacements are needed, an alternate 
track assignment can be made. 


Skip displacement processing has been designed to detect and skip displace all 
locations that are determined to have the potential of causing a data check. The 
skip displacement algorithm is extremely sensitive. In fact, skip displacements 
might be assigned to locations that were not experiencing data checks. 
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Skip displacements are not necessarily an indication that normal running conditions 
would encounter any data checks. More important, they are not an indication that 
an error site would be a detriment to the running of any application against user 
data on that track. 


3330: Surface checking is done for the 3330 by specifying CHECK(n) on the 
INSPECT command, where 'n' specifies the number of times the track is examined 
for data checks. An alternate track is assigned if an uncorrectable data check is 
detected. Alternate tracks reside beyond the last cylinder on the volume. Fora 
correctable data check, a message is issued. If an alternate track is required for a 
correctable data check, INSPECT can be used to unconditionally assign an alternate 
track. 


_ Fixed Block Architecture (FBA) Devices 
This section describes how surface checking works for devices that use fixed-block 
architecture format. 


3370, 9335: Surface checking is done for the 3370 and 9335 by specifying CHECK(n) 
on the INSPECT command, where 'n' specifies the number of times the block is 
examined for data checks. An alternate block is assigned for any data check that is 
detected. Alternate blocks reside on the same cylinder, or on a nearby cylinder and 
generally cause no measurable performance degradation. 


9332: No surface checking is available for the 9332. INSPECT can be used to 
unconditionally assign an alternate block if subsystem messages direct you to do 
SO. 


How to Use INSPECT for Surface Checking 


Surface checking for a 3380 or 3390 volume is invoked by using the INSPECT 
command with the SKIP parameter: © 


INSPECT UNIT(ccuu) NOVERIFY SKIP TRACKS(cccc,hhhh) 


The SKIP parameter invokes the skip displacement surface checking. The TRACKS 
parameter specifies the track to be processed. 


Notes on INSPECT parameters 


When using the NOSKIP parameter, if a data check is detected on a track during 
primary checking, skip displacement is performed for that track. 


Other parameters (not shown here) are available for the INSPECT command. 
For more information on command parameters, see the Device Support 
Facilities User’s Guide and Reference. 


Defaults and Optional Parameters: 


e PRESERVE is defaulted to ensure existing data is restored to the primary track 
when processing is complete. With DASD attached to the 3990 Models 2 or 3 
with the correct level of microcode installed, concurrent media maintenance 
functions are also defaulted. If PRESERVE is specified, and the track is 
exhibiting permanent data checks that prevent the successful reading of the 
data, no processing is done on the track. 
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INSPECT can be rerun specifying NOPRESERVE to skip displace the permanent 
data check location on the original track. NOPRESERVE erases the data on the 
track. If NOPRESERVE is used, the data on the track must be restored using 
recovery procedures for your computing complex. 


e¢ The CHECK(1) parameter is the default in the syntax sample shown. This 
ensures that surface checking procedures are in effect for this invocation. 


e ASSIGN is a default in the sample shown, and allows an alternate track to be 
automatically assigned if more than seven skip areas are required on the track. 


e This sample invocation of INSPECT operates on one track. Multiple tracks can 
be specified in a single invocation. 


What to Expect from INSPECT When Surface Checking 


INSPECT prints a message about any track for which it assigns a skip displacement 
and/or assigns an alternate track. It also provides a summary of any currently 
assigned alternate tracks. 


Skip displacement processing requires approximately one minute per track without 
the concurrent media maintenance function. INSPECT does multiple writes and 
reads, and the track is unavailable for other use until processing on it completes. 
Concurrent media maintenance allows you to access the data on the track during 
this time. 


Rewriting Data with INSPECT 


During high-activity workload periods, it might not be acceptable for a track to be 
unavailable for the time required to run INSPECT SKIP CHECK(1). However, with 
concurrent media maintenance, you can run INSPECT SKIP CHECK(1) and continue 
to access your data. In those cases in which you do not have access to concurrent 
media maintenance functions, you can rewrite data with INSPECT NOSKIP 
CHECK(1) and then schedule INSPECT SKIP CHECK(1) for a later time, if necessary. 
This rewriting process provides for possible straddling of small defects, and it also 
eliminates those data checks that were caused by hardware errors during writing or 
reading (assuming no hardware repair action was required, or has already been 

_ performed). 


Notes on the NOSKIP parameter 


| When using the NOSKIP parameter, if a data check is detected on a track during 
| primary checking, skip displacement is performed for that track. 


How To Use INSPECT to Rewrite Data 
The INSPECT command can be used to read and rewrite the data on a track. 
Existing data is read from the track. The surface of the track is checked for high 
repeatability, high visibility error sites, and the data is rewritten to the track. 


| The process of rewriting data is invoked with the INSPECT command using the 

| CHECK(1) parameter, and using the NOSKIP parameter for CKD devices that 
support skip displacement processing. For devices that do not support skip 
displacement processing, the procedure is identical to the surface checking 
process; see “How INSPECT Works for Surface Checking” on page 62. 


INSPECT UNIT(ccuu) NOVERIFY NOSKIP TRACKS (cccc,hhhh) 
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The NOSKIP parameter invokes the rewrite procedure. 


The NOVERIFY parameter allows INSPECT to run on this device regardless of 
volume serial number and owner ID of this volume. 


The TRACKS parameter specifies the track to be processed. 


Note: Other parameters (not shown here) are available for this command. For 
more information on command parameters, see the Device Support Facilities User's 
Guide and Reference. 


Defaults and Optional Parameters: 


¢ PRESERVE is a default and therefore is not shown in the sample. It ensures the 
rewrite procedure. 


¢ The CHECK parameter, which is also a default, not explicitly shown in the 
sample, ensures that the rewrite procedure is invoked. 


e For an FBA device, specify BLOCKS (instead of TRACKS) and eliminate the 
NOSKIP parameter. 


e The command sample shown operates on one track. Multiple tracks or blocks 
can be specified in the same invocation. 


What to Expect from INSPECT When Rewriting Data 


If an error site is detected during any part of the rewrite procedure, surface 
checking procedures applicable to the device types are automatically invoked for 
the track or block. (See “How INSPECT Works for Surface Checking” on page 62.) 


INSPECT provides a summary of all the currently assigned alternate tracks. 


The INIT Command 


The INIT command, which performs the initialize function, always writes a volume 
label, a volume table of contents (VTOC), and other items required for using 
volumes in specific operating environments. These functions are referred to as 
minimal initialization and only minimal initialization is supported on dual copy 
volumes. 


Access to existing data through a previous VTOC is destroyed by use of the INIT 
command. The INIT command should be followed by a command that will format 


the volume for the operating system that will be using it. For MVS and VSE volumes 


or minidisks, use the INIT command to perform a minimal initialization to write a 
volume label and a VTOC on the volume or minidisk. For VM volumes, you may use 
the ICKDSF CPVOLUME command to FORMAT/ALLOCATE cylinder zero or the 
entire volume. 


For CKD devices, the INIT command can be used to rewrite the home address and 
record zero fields for all tracks on a volume. This process is called medial | 
initialization, and it includes all functions provided by a minimal initialization. 


For FBA devices, INIT can be invoked to reclaim alternate blocks for the entire 


volume. This procedure is called maximal initialization and includes all the 
functions provided by minimal initialization. 
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Customer workload scheduling must allow Device Support Facilities exclusive 
control of the volume being processed. The INIT command uses data protection 
mechanisms available for each operating environment, and locks out other 
processors if the DASD is shared. However, INIT cannot guarantee Device Support 
Facilities exclusive access to the volume from the same processor. 


You must specify either VERIFY or NOVERIFY for every invocation of the INIT 
command. VERIFY provides additional control to ensure that the existing volume 
serial number and/or owner identification on the volume are correct. NOVERIFY 
bypasses this checking, and is used in all samples shown here. 


Rewriting Home Address (HA) and Record Zero (RO) with INIT 


INIT run at the medial level rewrites the home address and record zero on every 
primary and alternate track on the volume. All other data on the volume is erased. 


How To Use INIT for Rewriting HAs and ROs (Medial Initialization) 


Using the VALIDATE parameter with the INIT command starts medial initialization: 


INIT UNIT(ccuu) NOVERIFY VOLID(volser) VALIDATE NODATA 


The NOVERIFY parameter allows INSPECT to run on this device regardless of 
volume serial number and owner ID of this volume. 


The VALIDATE parameter forces the home address and record zero on every track 
to be rewritten. 


NODATA parameter ensures that all tracks not formatted in the minimal 
initialization process contain only a home address and a standard record zero at the 
completion of processing. 


Note: Other parameters (not shown here) are available for this command. For 
more information on command parameters, see the Device Support Facilities User’s 
Guide and Reference. 


Defaults and Optional Parameters: 


e DATA can be specified instead of NODATA. That will write a full track of data on 
every track on the volume. That data is a predefined pattern similar to the data 
used to certify the volume at the factory. The data is referred to as factory 
functional verification data patterns. Note that previously existing data on the 
volume is erased. 


What to Expect from INIT When Rewriting HAs and ROs (Medial Initialization) 
At the completion of processing, INIT provides a summary of the assigned alternate 
tracks. These alternates were already assigned when INIT processing began; no 
alternates are assigned or reclaimed as a result of a medial initialization. 


For a single device, medial initialization takes from 15-60 minutes to execute, 
depending upon the capacity of the device. When multiple devices are being 
initialized concurrently, that time increases based on the number of devices and the 
number of paths. The run time is the same with and without the DATA option. 
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Reclaiming Alternate Blocks with INIT 


When INIT is run for FBA devices at the maximal level with the RECLAIM option, it 
effectively “unassigns” alternate blocks for those blocks that do not experience data 
checks during the surface checking procedure. Alternate blocks assigned: at the 
factory are never reclaimed. 


How to Use INIT for Reclaiming Alternate Blocks on FBA Devices 
Use the RECLAIM and CHECK parameters with the INIT command: 


INIT UNIT(ccuu) NOVERIFY VOLID(volser) CHECK(3) RECLAIM 


The NOVERIFY parameter allows INSPECT to run on this device regardless of 
volume serial number and owner ID of this volume. 


The CHECK(3) parameter ensures a sufficient level of surface checking for each 
block on the volume. 


RECLAIM parameter is required to regain access to blocks that were previously 
flagged as defective, if surface checking is successful. 


What to Expect from INIT When Reclaiming Alternate Blocks 


The process takes about one hour, depending on the device type. At the completion 
of processing, INIT provides a summary of the assigned alternate blocks. This 
function is not supported for the 9332 device. 


The INSTALL Command 
The INSTALL command is used to: 


e Change modes on the 3390 device from 3390 to 3380 track Bopanna? mode or 
vice versa. 


e Prepare volumes of the 3380 and 3390 DASD. 


The command is an enhanced installation procedure which includes the writing of 
home address and record zero on every track on the volume. It can be used when 
the validation functions of medial initialization are desirable: 


The INSTALL functions are invoked by specifying the following command. 
INSTALL UNIT (ccuu) 


The INSTALL command does not support dual copy volumes. 


Using INSTALL to Change Modes on a 3390 


The 3390 device is capable of functioning in either 3390 or 3380 track compatibility 
mode. In 3390 mode, the entire track capacity of the device is available for use. In 
3380 track compatibility mode, devices are formatted so that their tracks have the 
same capacity as 3380 tracks. You must use the Device eapren Facilities INSTALL 
command to change a 3390 to a particular mode. 


You can switch the mode of a 3390 from 3390 mode to 3380 track compatibility mode 
using the SETMODE parameter as follows: 


INSTALL UNIT(ccuu) SETMODE (3380) 
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You can switch the mode of a 3390 from 3380 track compatibility mode to 3390 mode 
as follows: 


INSTALL UNIT(ccuu) SETMODE (3390) 


For information on selecting a mode, refer to the Device Support Facilities User’s 
Guide and Reference. Similar information can also be found in: 


° Using IBM 3390 Direct Access Storage in an MVS Environment 
e Using IBM 3390 Direct Access Storage in a VM Environment 
e Using IBM 3390 Direct Access Storage in a VSE Environment 


What to Expect from the INSTALL Command 


After the INSTALL command processes, the volume is not initialized for any 
operating system or operating environment. Alternate tracks are reset and 
reassigned if necessary. 


After a successful completion of the INSTALL command, a minimal initialization 
should be run. 


The REVALIDATE Command | 


The REVALIDATE command combines the track validation functions of medial 
initialization with the problem determination and data verification functions of the 
ANALYZE command, and also the INSPECT functions, if required. This command is 
valid on IBM 3380 and 3390 volumes only. REVALIDATE , in one command, 
performs the following combination of functions: 


e A drive test 

e Home address and record zero validation 

¢ Data verification of the factory functional verification data patterns (FFVDP) 
e Surface checking on tracks if required. 


How to Use REVALIDATE 


The REVALIDATE functions are invoked by specifying the following command: 
REVALIDATE UNITADDRESS(ccuu) VERIFY (serial,owner) 
The UNITADDRESS parameter identifies the device on which the volume is 


mounted. For ccuu, specify the address, in hexadecimal (3 or 4 digits), of the 
channel and unit on which the volume is mounted. 


The UNITADDRESS parameter is required for processing a volume in MVS or ina 
CMS or stand-alone environment. For processing a volume in the VSE environment, 
the SYSNAME parameter must be used. 

The VERIFY parameter is required when you want to verify the volume serial 
number and owner identification before processing the volume. If the volume serial 
number or owner identification does not match that found on the volume, the 
REVALIDATE command terminates. 


For seria/, substitute 1 to 6 alphanumeric characters for the volume serial number. 


For owner, substitute 1 to 14 alphanumeric characters for the owner identification. 
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When you want to bypass verification of the volume serial number and owner 
identification, the NOVERIFY parameter must be used. 


Note: The device must be on a channel that is online. If the device is on a channel 
that is offline, the program might enter a nonterminating wait state. 


See Device Support Facilities User’s Guide and Reference for more information on 
required parameters and restrictions. | 


What to Expect From REVALIDATE 


At the completion of this command, all tracks are formatted for use by IBM 
operating systems. Alternate tracks are reset and reassigned if necessary. 
However, the volume label, VTOC, and all user data are destroyed. The 
REVALIDATE command should be followed by a command that will format the 
volume for the operating system that will be using it. For MVS and VSE volumes or 
minidisks, use the INIT command to perform a minimal initialization to write a 
volume label and a VTOC on the volume or minidisk. For VM volumes, you may use 
the CPVOLUME command to FORMAT/ALLOCATE cylinder zero or the entire 
volume. See “The INIT Command” on page 66 for more information on minimal 
initialization. , 


Summary information regarding the results of the drive test and surface checking 
are presented. 


Note 


This command operates on the full volume and destroys all data. It should not be 
used as an alternative when media maintenance actions are required for the 
device unless explicitly directed to do so by media maintenance procedures in 
Appendix B. | 


The REVALIDATE command does not support dual copy volumes. 
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Appendix A. Background Information 


This appendix provides background information on several topics related to direct 
access storage devices and the structure of the storage subsystem. An 
understanding of this material is helpful for performing media maintenance. 


Resources for Additional Information 


The information in this appendix provides high-level, introductory material on disk 
storage function. In addition to the manuals listed in the bibliographies for this 
manual and other members of the Storage Subsystem Library, there are some other 
types of educational opportunities available from IBM. Consult your IBM 
representative for information on current offerings in classroom education, 
self-study, and technical update video presentations. 


DASD Physical Characteristics 


The various types and models of direct access storage have some common 
attributes related to the physical components of the units and the way these 
components function together to store data. Be sure to refer to the disk storage 
manuals listed in the bibliography for information on the characteristics of specific 
types of DASD. 


Disks are the media used for storing data. They are arranged either horizontally or 
vertically, depending on the type of DASD. As the disks rotate, the heads (which are 
carried on the access arms as shown in Figure 24), write and read data (bit 
patterns) on the disk surface. The read/write heads do not contact the disk surface 
but remain above it, on a cushion of air. The access arms and read/write heads 
move back and forth between the disks so that both sides of the disk surfaces can be 
accessed for data storage. 


Disk 


Access Arms ad 
with Read/Write 
Heads 


Access Arms 
with Read/Write 
Heads 


Figure 24. Disks with Two Read/Write Heads per Surface 
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Disk Surface Structure - Tracks 


The disk surface is subdivided into separate concentric circles, or tracks, for storing 
data. For a given DASD type, each track can hold the same amount of data (number 
of bytes). The number of tracks per disk surface and the number of bytes of data 
that can be stored on a track vary for the different DASD types. Refer to “Tracks and 
Records on CKD and FBA Devices” on page 73 for additional information on tracks 
and their function in structuring and addressing data. 


Access to the Disk Surface 


Cylinders 


Packaging 


A set of access arms that carries read/write heads accesses the disk surface. An 
access arm moves all the heads at the same time and for the same distance. 
Therefore, when the heads are positioned at a track on one of the disk surfaces, a 
track at the comparable location at each of the other disk surfaces is accessible with 
only a small movement of the access arm. Although there are multiple read/write 
heads on each access arm, only one head transfers data at any given time. 


A set of access arms with read/write heads, the associated set of disk surfaces, and 
the electronic circuitry that controls access to the disk surfaces are collectively 
referred to as a device. Some types of DASD have only one device for each set of 
disks, but more recent types have two devices for each physical arrangement of 
disks. When there are two devices associated with each head-disk assembly, each 
is uniquely addressable and operates independently on a different subset of the disk 
surfaces. | 


See “The Components of the Storage Subsystem” on page 78 for additional 
information on the elements of the DASD unit. 


Tracks are grouped into sets called cylinders. Whereas CKD terminology implied 
that there was a geometric relationship between tracks in the same cylinder, 
extended count-key-data (ECKD") defines a cylinder to be an arbitrary grouping of 
tracks. The only requirement is that all cylinders contain the same number of 
tracks, and that tracks within a cylinder be numbered consecutively, starting with 
zero. 


ECKD makes a specific point of discouraging geometry-dependent channel 
programs by systematically denying any reliable inference of device geometry 
underlying the track and cylinder arrangements. ECKD clearly promotes the 
construction of straightforward channel programs to perform data transfer 
operations. 


The disk and device components for the various types of DASD are packaged in 
different ways. The disk pack used on a 3330, and the data module used on the 3340 
can be moved from one disk drive mechanism to another by operations personnel. 
For 3344, 3350, 3370, 3375, 3380, 3390, 9332, and 9335 devices, the head-disk 
assemblies are sealed in an enclosure and permanently mounted on the drive 
mechanism. Only the IBM service representative can remove them. 

Nonremovable, sealed head-disk assemblies help to prevent problems that can 
cause data errors. | 


“ ECKD is a trademark of the International Business Machines Corporation. 
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Tracks and Records on CKD and FBA Devices 


There are some concepts and techniques, common to most DASD types, that are 
key to understanding how data is arranged for storage on disk, how it is specified 
for selection, and how it is grouped for transfer between the processor and disk 
storage. Introductory information is provided here on those topics, along with some 
additional considerations for the application view of data. The information is 
presented first for count-key-data (CKD) devices and then for fixed-block 
architecture (FBA) devices. 


Logical and Physical Record Relationship 


The unit of data stored on disk is a physical record. A physical record contains both 
data and control information which describes the data and allows it to be checked 
for correctness when being read. The physical record is the unit of data that is 
transferred between disk storage and the processor. 


A logical record is the unit of data that is meaningful to the user and the processing 
programs. The user specifies the length and format of logical records. A single 
logical record can be the same length as a physical record, or multiple logical 
records can be blocked into one physical record. Usually, logical records are 
relatively small, and combining them into physical records has several advantages: 


e First, larger physical records result in better use of storage space on the track 
by reducing the amount of space required for addressing information and gaps 
and diminishing any leftover or wasted space. 


¢ In addition, blocking can increase processing efficiency by reducing the number 
of I/O operations required to process data. 


The operating system handles both the blocking of logical records into physical 
records for storage on disk, and the deblocking of records for use by the processing 
program. 


It is possible for a logical record to extend beyond one physical record into another 
physical record, especially if the physical records are a predetermined fixed length 
that is relatively small. In this case, the operating system separates the portions of 
the logical record for storage on disk and then recombines them for use by the 
processing program. Depending on the access method used, the blocking and 
deblocking functions might not be performed automatically by the operating system. 


Physical Records (CKD Devices) 


With CKD format DASD, the needs of the application determine the length in bytes 
allocated for the data portion of the physical records. A channel program writes 
control information for each record in a predefined pattern for CKD format. One 
physical record is separated from the next physical record by a gap, or unused 
space, on the disk surface. 


Note: All of the disk storage types covered in this section use the count-key-data 
(CKD) format and control. Refer to “Count-Key-Data Record Format” on page 74 for 
more detailed descriptions of these record formats. 
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A physical record on a CKD DASD type has a track address that identities its 
location on the volume. A track address is four bytes. in engin and consists of a 
eeylinder number and a head number: 


CC WHE 


bi Four hexadecimal characters 


Four hexadecimal characters - 


Fora specific device, both cylinder and Cae. are numbered sequentially beginning 
_ with zeros. 


With two heads per surface, the two tracks accessed when the device is in a given 

position have the same cylinder number. There are two possible head address 

numbers per surface for each cylinder address number. For example, cylinder 

address 02 might have head addresses 02 and 03 on one surface, 04 and 05 on 
another surface, and so on, depending on the number of disk surfaces. 


-. _ The technique for specifying a record for selection for CKD format is by means of a 
5-byte record identifier. This identifier may contain a track address and a record 
number or a user specified identifier. For DASD using the CKD record format, the 
unit of data that is transferred between disk storage and the processor might involve 

the entire record including the: control areas, or it aman involve only the data area 
or only the control information. 


Count-Key-Data Record Format 


ECKD uses the same ‘track addressing scheme as CKD. The track is the smallest 
| directly addressable space ona device. Each track has an arbitrary starting point, 
called an index (see Figure 25). 


| | | | | — You iguaily visualize a track as occupying a full 360 degree rotation of the disk 
| | | | medium; however, ECKD specifically defines tracks in such a way that there is no 
| assurance that a track corresponds to a full device rotation. 


Cylinder 03 


- Head00 =———=é«Recrord - Record1 Record 2 


Head 01 ba RecordO0 = ~—~—si Record 1 _ - Record 2 


| Figure 25. Count-Key-Data (CKD) Track and Record Formats 


The CKD format consists of the following elements: 
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Home Address 


Record Zero 


Data Records 


The first field on a CKD track that identifies and defines its operational status is the 
home address. It is written on each track immediately after the index point. 


The home address area also includes descriptive information on the condition of the 
track. For most CKD direct access storage, there is information to control the 
skipping of defective areas on the track. A flag in the home address area indicates 
whether an entire track is defective or whether it is an alternate for another track 
that is defective. 


The home address is written during the manufacturing process and is sometimes 
rewritten by means of the Device Support Facilities in performing maintenance 
operations. 


The track descriptor record is always the first record on the track following the 


home address. It is record zero (RQ). If a flag in the home address area indicates 
the track is defective, then the count area of record zero contains the track address 
of the track that is to be used as an alternate. The count area of record zero on the 
alternate track contains the track address of the defective, primary track. The data 
area of a standard record zero is 8 bytes long and is initialized to zero. 


Record zero is written during the manufacturing process. When it is necessary to 
rewrite record zero in performing maintenance operations with the Device Support 
Facilities, all data on the track is erased. 


Following the track descriptor record (record zero), one or more user records can 
be written on the track. These records are typically numbered in sequence. Each of 
these physical records contains a count area, an optional key area, and a data area, 
each of which is separated by a gap. Checking information is added to each area 
when it is written and is used later for detecting and correcting data errors. 


Count Area contains the ID of the data area that follows. The record ID is 
specified by a value expressed as five bytes (CCHHR). In addition to the record 
id, the count area specifies the length in bytes of the key and data areas of the 
record. 


Key Area is an optional portion of the record. It can be used by the programmer 
to identify the information in the data area of the record. (Note that there is no 
key area in the standard record zero.) 


Data Area contains data that has been organized and arranged by the 
programmer. 


The number of records that can be placed on a track depends on the length of the 
data areas of the records, whether the records contain a key area, and the size of 
the gaps required by the DASD type. Records may be of equal or unequal lengths. 


For 3330, 3340, and 3350 disk storage, there is hardware support for extending a 
physical record from the end of one track to the beginning of the next. This is called 
record or track overflow. For each segment of the record that overflows to another 
track, there is separate control information (that is, a separate count area). For the 
3375, 3380, and 3390, a physical record can not extend beyond the end of a track, 
because track overflow is not supported. 
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The terms “block” and “physical block” are sometimes used to refer to the data 
area of the record. The size of the data area of the physical record is used in 
calculating the number of records that can be placed on a track. The manuals for 
the specific device types provide instructions and guidance for calculating block oe 
sizes for effective performance and use of space. 


Physical Records (FBA Devices) 


For fixed-block architecture (FBA) devices, the data area of all physical records has 
a standard length that is predetermined for a disk storage type. The user or 
application is not involved with specifying the number of bytes for the data area, and 
the control information is written on the track in a pre-established pattern during the 
manufacturing process. One physical record is separated from the next physical 
record by a gap, or unused space, on the disk surface. 


For FBA DASD, a relative block number identifies the location of a specific physical 
record. The entire volume is formatted into a continuous sequence of numbered 
blocks, arranged at evenly spaced intervals. 


Records selected for FBA format are specified by the relative block number. The 
storage subsystem converts the block number to a track address and sector 
location. 


For DASD using the FBA record format, the unit of data that is transferred between 

disk storage and the processor can also involve the data area, control area, or both. 

In addition, it is possible for the operating system to combine multiple records for 

transfer at a control interval, which contains user data from multiple physical 

records plus certain control information. a 


Fixed-Block Architecture Record Format 


For FBA disk storage, tracks and records are formatted at the time of manufacture. 
Each track is subdivided into a specific number of fixed-block records. Figure 26 
shows how the records are arranged on the track. 


Record 0 Record 1 Record 2 Record 20 


. Area. 


Record 80 Record 81 Record 82 


Figure 26. Fixed-Block Architecture (FBA) Track and Record Formats 


Each fixed-block record has the following elements: 


An ID Area which contains the block number and the cylinder and head 
numbers. If the block is skipped because it is defective, information pointing to 
the alternate block is contained in the ID area of the defective block. The ID 
area of the alternate block contains information pointing to the defective block it 
is replacing. 


A Data Area, of a fixed number of bytes, to store user data. ye 
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Identification of Storage Subsystem Components 


I/O Address 


Each disk storage subsystem component needs to be identifiable both for I/O 
selection by the operating system and for serviceability. There are two distinct 
types of designations for the disk storage components to fulfill these needs. 


e Anl/O Address allows the system to select a specific device. 


¢ A Physical Identifier provides unique identification of each physical component 
of the storage subsystem. 


A high-level description of each of these identification concepts follows. For 
detailed information on I/O addressing and identification for the specific DASD 
types, see the appropriate manuals listed in the bibliography. 


You assign the addresses that are used by the operating system to access each disk 
storage device for I/O operations. The addresses used for I/O selection are also 
reported in system console messages and included in various usage and EREP 
reports. 


Because a disk storage device can be accessed through several different channels 
and storage controls, a given disk storage device has multiple access paths and 
thus has multiple 1/O addresses. 


Although a given disk storage volume can be selected by various I/O addresses that 
reflect different channels, the device portion of the address is always the same for a 
given device. Each device has a unique address that is physically set in the 
hardware at the time of manufacture and cannot be changed by the user. 


There are valid ranges for the device addresses in a storage subsystem 
configuration. The operating system might impose additional addressing 
constraints and conventions that are not required by the hardware. In addition, 
operating systems have their own techniques for device identification. For example, 
MVS uses unit addresses in System/370° mode and device numbers in Extended 


‘Architecture mode and ESA/370° mode. 


As configuration options for alternate access paths increase, it is important to 
understand and plan the addressing of I/O devices ahead of time. For specific 
information on bit composition and valid ranges for 1/O addresses for 3380 DASD, 
see: 


IBM 3380 Direct Access Storage Introduction 
IBM 3380 Direct Access Storage Direct Channel Attach Model CJ2 Introduction 
and Reference. 


For similar information for the 3390, see: 


Using IBM 3390 Direct Access Storage in an MVS Environment 
Using IBM 3390 Direct Access Storage ina VM Environment 
Using IBM 3390 Direct Access Storage ina VSE Environment. 


See other disk storage manuals listed in the bibliography for information on storage 
subsystems composed of other DASD types. 


* $YSTEM/370 and ESA/370 are trademarks of the International Business Machines Corporation. 
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Physical identifier 


The 3375, 3380, and 3390 direct access storage and the 3880 and 3990 storage 
controls, have physical identifiers. The primary purpose of the physical identifier is 
to describe a failing unit in an error situation, and physical IDs can also be used in 
configuration management. Physical IDs are especially useful for programs such as 
EREP, which correlate information on specific devices from multiple systems. 


For a storage subsystem that has a 3880 Storage Control, each storage director, 
string, and device has a unique physical ID. For a storage subsystem that has a 
3990 Storage Control, each logical subsystem, string, and device has a unique 
physical ID. These IDs (either the storage director or the subsystem, the string, and 
the device) cover a complete path of an error occurrence. 


The Components of the Storage Subsystem 


Disk Storage 


A storage subsystem contains disk storage, string controllers, and storage control 
hardware. Collectively, these components, and the software that supports them, 
form the DASD storage subsystem portion of the I/O subsystem in a processor 
complex. A general description of each of these components is provided here; see 
the hardware manuals listed in the bibliography for detailed information on specific 
storage hardware types and models. 


A direct access storage device is a complex unit consisting of various mechanical, 
electrical, and electronic subassemblies. 


The disks are the media, or surfaces, on which the data is stored. For some DASD 
types, the disks are stacked vertically and for other types, the disks are arranged on 
a horizontal axis. A drive mechanism controls the rotation of the disks. Some 
DASD types have multiple drive mechanisms in each unit. 


A set of access arms and the read/write heads attached to the access arms move . 
as an independent component called an actuator. A set of access arms with 
read/write heads, the associated set of disk surfaces, and the electronic circuitry 
that controls access to the disk surfaces are collectively referred to as a device. 
The DASD space that is accessible by a specific device is referred to as a volume. 


The field replaceable unit that contains the device hardware is a head-disk 
assembly (HDA). Some types of DASD units contain more than one HDA. For 
example, the 3380 model AK4 has two HDAs, each of which contains two devices. A 


3390 A-unit can have two or four HDAs, while a B-unit can have either two, four or 


six HDAs. Each 3390 HDA contains two devices. 


“DASD Physical Characteristics” on page 71 provides additional information on | 
how the DASD components function in I/O operations. 
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Controllers 


One or more DASD units can be connected to form a string of DASD. The number of 
DASD units allowed in a string is dependent on the disk storage type; a string may 
include fewer than the maximum allowable number. 


The controller is an integral part of certain disk storage models and controls 
operations between the storage control and the devices. A DASD model that 
contains controller hardware (for example, a 3380 A-unit or C-unit) is often referred 
to as a head-of-string; the other DASD models in the string, B-units, do not contain 
controller hardware. 


A disk storage string has one or more controllers. Two controllers, for example, 
may be used on a 3350 where one operates as an alternate. A 3370 string has one 
controller. With the exception of the 3380 Model A04 string, a string of 3380s can 
have two concurrently active controllers when configured with a 3880 Storage 
Control; a 3380 4-path string attached to a 3990 Model 2 or 3 Storage Control can 
have four concurrently active controllers. 


A 3390 controller is sometimes referred to as a device adapter. This hardware 
component of a DASD head of string unit provides the path control and data transfer 
function. Four of these adapters (numbered 0-3) are contained in each 3390 A-unit 
and can be concurrently active. 


Storage Control 
One or more DASD strings can be attached to a storage control. A storage control 
handles interaction between the processor channel and the DASD; it executes 
channel commands, manages subsystem error recovery, controls the DASD | 
devices, and manages cache (when applicable). Cache, electronic storage in the 
storage control, is used to retain frequently used data for faster access by the 
processor. 


The 3880 Storage Control has two storage directors, each of which has all the 
functions of an independent storage control unit and provides two independent 
paths for accessing DASD. 


The 3990 Model 2 or 3 Storage Control has two independently functional areas 
called storage clusters that have separate power and service regions. Each storage 
cluster provides access to DASD through two storage paths. Depending on the | 
configuration, as many as four independent storage paths can be available to each 
device in the attached string. 


Direct Channel Attach 


The IBM 3380 Direct Channel Attach Model CJ2 provides 3380 disk storage, 
controller function for a DASD string, and storage control function in a single unit 
called a “C-unit.” This 3380 model can be directly attached to a host processor 
channel. See /BM 3380 Direct Access Storage Direct Channel Attach Introduction 
and Reference for further details. 


Appendix A. Background Information 79 


80 Maintaining IBM Storage Subsystem Media 


Appendix B. Specific Guidelines by DASD Type 


This appendix provides specific error handling guidelines for each DASD type. 
General error recovery procedures are described in “Chapter 2. The Error Handling 
Process” on page 21. Details on using the Device Support Facilities commands to 
perform specific functions are provided in “Chapter 5. Device Support Facilities” on 
page 5/7. 


The specific guideline information is presented in tabular form, with a separate 
table for each error condition that applies to the DASD type. (Because they are 
operationally similar, guidelines for the 3375 and 3380 are combined.) The tables 
are preceded by special instructions for each type, as required. 


The tables for each DASD type are arranged in the priority order in which you 
should treat the conditions. You can identify conditions from information obtained in 
the Service Information Messages report, if your device produces SIMs. If your 
device does not produce SIMs, use the DASD Data Transfer Summary report to 
obtain this information. These reports are described in “Using EREP to Generate 
Reports” on page 25. 


For SIM DASD, the media maintenance procedure number determines the priority 
with which you should treat the procedure task(s). If you have a non-SIM DASD, and 
a volume shows more than one condition in the DASD Data Transfer Summary 
report, treat the condition with the highest priority first, and this might take care of 
lower priority conditions. 


To use the tables, select the device type and condition that apply and then perform 
media maintenance actions as described. 


Device Support Facilities Commands 


It is important to note that the command syntax shown in this appendix includes 
only relevant commands and parameters needed to illustrate the media 
maintenance actions. Most of the sample coding sequences shown in this 
appendix operate on one block or track at a time. 


See the Device Support Facilities User’s Guide and Reference for command and 
parameter syntax before you code any of these media maintenance actions. 


The following corrective-action tables are provided in this appendix. Error 
conditions are listed in the priority order in which you should handle them if more 
than one condition is present. You can identify the presence of these conditions by 
reviewing the Service Informational Messages report for SIM DASD, or the DASD 
Data Transfer Summary report for non-SIM DASD. These reports are discussed in 
detail in “Using EREP to Generate Reports” on page 25. 
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3330 Conditions..... See “Error Handling for 3330” on page: 83 


1. Permanent Data Checks at 1 to 10 Tracks 

2. Temporary Data Checks at Known Tracks 
3. Temporary Data Checks at Unknown Tracks 
4. Permanent Data Checks at 11 or More Tracks 


3340 Conditions..... See “Error Handling for the 3340” on page 86 


1. Permanent Data Checks at 1 to 10 Tracks 
2. Temporary Data Checks 
3. Permanent Data Checks at 11 or More Tracks 


3344 Conditions..... See “Error Handling for 3344” on page 88 


1. Permanent Data Checks at 1 to 10 Tracks 
2. Temporary Data Checks 
3. Permanent Data Checks at 11 or More Tracks 


3350 Conditions..... See “Error Handling for 3350” on page 90 


1. Permanent or Temporary Data Checks with 4940/4941 Symptom Codes 

2. Permanent Data Checks at 1 to 10 Tracks (<3 4940/4941 Symptom Codes) 
3. Temporary Data Checks at Known Tracks (<3 4940/4941 Symptom Codes) 
4. Temporary Data Checks at Unknown Tracks 

5. Permanent Data Checks at 11 or More Tracks 


3370 Conditions..... See “Error Handling for 3370” on page 94 


1. Permanent Data Checks at 1 to 10 Blocks 

2. Temporary Data Checks at Known Blocks 

3. Temporary Data Checks at Unknown Blocks 
4. Permanent Data Checks at 11 or More Blocks 


3375 and 3380 Conditions. .... See “Error Handling for 3375 and 3380” on page 96 


1. Temporary Data Checks with Offset at 3 or More Tracks 
2. Permanent Data Checks at 1 to 10 Tracks 

3. Temporary Data Checks 

4. Permanent Data Checks at 11 or More Tracks 


3390..... See “Error Handling for 3390” on page 102 


Procedures are odd numbers 1 through 9. 


1. 3890 Media Maintenance procedure 1 
2. 3390 Media Maintenance procedure 3 
3. 3390 Media Maintenance procedure 5 
4. 3390 Media Maintenance procedure 7 
5. 3390 Media Maintenance procedure 9 


9332 Conditions. .... See “Error Handling for 9332” on page 108 


1. Field Replaceable Unit (FRU) Replacement 
2. Alternate Block Assignment 
3. File Backup 
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3330 


9335 Conditions. .... S86 “Error Handling for 9335” on page 110 


- Temporary Data Checks on Multiple Devices 
. Permanent Data Checks at 1or2 Blocks 
. Permanent Data Checks at 3 or More Blocks" 
. Temporary Data Checks at 1 to 3 Blocks | 

. Temporary Data Checks at 4 or More Blocks 
. Sector Retry Exceeded | 

. Alternate Block Assignment Fails 


NOOO fP WON = 


Error Handling for 3330 


Special Instructions 
Caution 


Be cautious when moving the disk pack to another drive. Ifthere is a serious 


defect on the disk, you may damage a head on the other drive. 


For permanent data checks, Device Support Facilities automatically assigns an 
alternate track if defects on the track are confirmed with surface checking. 


For temporary checks with cylinder and head numbers, you may use the INSPECT 
command to unconditionally sala an alternate track, without surface checking the 
track. 


_ If excessive temporary data checks are reported in the DASD Exception report, use 
the ANALYZE command with NODRIVETEST and SCAN options to determine the 
location of the data checks. If surface defects are confirmed, the cylinder and head 
numbers of the tracks are reported in Device Support Facilities messages. If you 
decide to take. action, you may then use the INSPECT command to unconditionally 
assign an alternate track. 


Before assigning an alternate track for temporary errors, be aware that when data is 


later read or written, performance can be affected by the time needed to detour to 
the alternate track and then return to the normal data location. 
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3330 


3330 Condition 1 - Permanent Data Check at 1 to 10 Tracks 


Device Support Your Response to Device. 
Your Actions Facilities Actions Support Facilities Actions 


You can try moving disk pack to another 
drive, and attempt to rerun the job. If 
data check does not occur, call your 
IBM service representative for possible 
hardware problem. If data check does 
occur, return disk pack to original drive. 


When moving the disk pack to another 
drive, it is possible to damage a head at 
the other drive if there is a serious 

defect on the disk surface. 


important: Because the data check is 
permanent, it is likely that the INSPECT 
sequence shown below will fall into the 
NOPRESERVE portion. Thus, your data 
will be lost and you will need to rely on 
other backup copies to recover. 


Use the following Device Support Facilities command sequences: 


ANALYZE NODRIVETEST SCAN - 
CYLRANGE(cccc-5, cccc + 5) 


data checks. 


INSPECT TRACK(cccc hhhh) - 
CHECK (1) - 
ASSIGN - 
PRESERVE 

IF LASTCC = 8- 

THEN - 

INSPECT TRACK(cccc hhhh) - 
CHECK(1) - 
ASSIGN - 

NOPRESERVE 


calculation must be performed first, then enter the result. 


Gives messages with cylinders and 
head numbers for tracks, within the 
specified range, that have repeatable 


Preserves data from track if readable. 
Checks track surface. Skips defects. 
Assigns alternate track if necessary. If 
data is preserved, restores data. 


Note: Cylinder parameters cannot be entered exactly as specified, for example: CYLRANGE(X'cccc' + 1, X'cccc' +8). The 


Add new tracks that ANALYZE reports 
as having uncorrectable data checks to 
those already established as 
permanent data checks on the volume. 
if the total is 10 or fewer, perform 
INSPECT (as shown) for each track. If 
the total is more than 10, call your IBM 
service representative. 


If INSPECT executes and data was not 
preserved, restore data from a backup 
copy created before error occurred, 
and update as needed. 


VM users operating on volumes that are not attached to a userid should see 
“Performing Media Maintenance in VM” on page 59. 
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3330 


3330 Condition 2 - Temporary Data Checks at Known Tracks 


Applicable when you know the error locations and want to assign alternate tracks. . 


Device Support Your Response to Device 
Your Actions Facilities Actions Support Facilities Actions 


INSPECT TRACKS(cccc hhhh) - Preserves data from the track. Does 
NOCHECK - not surface check. Unconditionally 
ASSIGN - assigns an alternate track for the 

PRESERVE specified track. 


Restores data to alternate track. 


| VM users operating on volumes that are not attached to a userid should see 
| “Performing Media Maintenance in VM” on page 59. 


3330 Condition 3 - Temporary Data Checks at Unknown Tracks 


Applicable when error locations (cylinder and head numbers) are unknown and you 
want to determine error locations. 


Device Support Your Response to Device 
Your Actions Facilities Actions Support Facilities Actions 


Use Device Support Facilities command If temporary data checks are Use Device Support Facilities INSPECT 
ANALYZE NODRIVETEST SCAN. repeatable, gives message with command as in Condition 2, if you want 
cylinder and head numbers. to assign alternate tracks 


| VM users operating on volumes that are not attached to a userid should see 
| “Performing Media Maintenance in VM” on page 59. 


3330 Condition 4 - Permanent Data Checks at 11 or More Tracks 


| VM users operating on volumes that are not attached to a userid should see 
| “Performing Media Maintenance in VM” on page 59. 
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3340 


Error Handling for the 3340 


3340 Condition 1 - Permanent Data Checks at 1 to 10 Tracks 
Device Support Your Response to Device 
Your Actions Facilities Actions Support Facilities Actions 
Move data module to a different drive 
and attempt to read data. If data check 
does, not occur, call your IBM service 
representative for a possible hardware 
problem. If a data check does occur, 
return data module to original drive. 
important: Because the data check is 
permanent, it is likely that the INSPECT 
sequence shown below will fall into the 
NOPRESERVE portion. Thus, your data 
will be lost and you will need to rely on ¥ 
other backup copies to recover. 
| Use the following Device Support Facilitiescommand sequences: = ssi(i‘—sSSCSCid the following Device | Use the following Device Support Facilitiescommand sequences: = ssi(i‘—sSSCSCid Facilities command sequences: 
ANALYZE NODRIVETEST SCAN - Gives messages with cylinders and Add new tracks that ANALYZE reports 
CYLRANGE(cccc-5, cccc + 5) head numbers for tracks, within the as having uncorrectable data checks to 
specified range, that have repeatable those already established as 
data checks. . permanent data checks on the volume. 
if the total is 10 or fewer, perform 
INSPECT (as shown) for each track. If 
the total is more than 10, call your IBM 
| service representative. 
_ INSPECT TRACK(cccc hhhh) - Preserves data from track if readable. If INSPECT executes and data was not eo 
CHECK(1) - Checks track surface. Skips defects. preserved, restore data from a backup | 
ASSIGN - Assigns alternate track if necessary. If copy created before error occurred, 
PRESERVE data is preserved, restores data. and update as needed. 
IF LASTCC = 8- 7 
THEN - 
INSPECT TRACK(cccc hhhh) - 
CHECK(1) - 
ASSIGN - 
NOPRESERVE 
| Note: Cylinder parameters cannot be entered exactly as specified, for example: CYLRANGE(X'cccc’ + 1, X'cccc' +8). The 
| calculation must be performed first, then enter the result. , 
‘ Se 
| VM users operating on volumes that are not attached to a userid should see 
| “Performing Media Maintenance in VM” on page 59. 
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3340 


3340 Condition 2 - Temporary Data Checks 


Device Support Your Response to Device 
Your Actions Facilities Actions Support Facilities Actions 


INSPECT TRACKS(cccc hhhh) - Preserves data if readable. Checks If data could not be preserved, you 


CHECK(1) - track surfaces. Skips defects. Assigns might want to use INSPECT with 
ASSIGN - an alternate track if necessary. NOPRESERVE. 


PRESERVE Restores data. 


| VM users operating on volumes that are not attached to a userid should see 
| “Performing Media Maintenance in VM” on page 59. 


3340 Condition 3 - Permanent Data Checks at 11 or More Tracks 


Device Support Your Response to Device 
Your Actions Facilities Actions Support Facilities Actions 


| VM users operating on volumes that are not attached to a userid should see 
| “Performing Media Maintenance in VM” on page 59. 
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3344 | 
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3344 Condition 1 - Permanent Data Checks at 1 to 10 Tracks 


Device Support Your Response to Device | 
Your Actions Facilities Actions Support Facilities Actions 


important: Because the data check is 
permanent, it is likely that the command 
sequence shown below will fall into the 
INSPECT NOPRESERVE portion. Thus 


your data will be lost and you will need 
to rely on other backup copies to 


recover. 


Use the following Device Support Facilities command sequences: 


ANALYZE DRIVETEST SCAN - 
CYLRANGE(cccc-5, cccc + 5) 


INSPECT TRACK(cccc hhhh) - 
CHECK(1) - 
ASSIGN - 
PRESERVE 
IF LASTCC = 8 - 
THEN - 
INSPECT TRACK(cccc hhhh) - 
CHECK(1) - 
ASSIGN - 
NOPRESERVE 


Exercises hardware. Gives messages 
with cylinders and head numbers for 
tracks, within the specified range, that 
have repeatable data checks. 


Preserves data from track if readable. 
Checks track surface. Skips defects. 
Assigns alternate track if necessary. If 
data is preserved, restores data. 


if “Suspected Drive Problem” message 
from ANALYZE, use appropriate utility 
or program to temporarily dump as 
much data as possible from al/ volumes 
of head disk assembly to another 
volume. Call your service 
representative for a possible hardware 
problem. ; 


Add new tracks that ANALYZE reports 
as having uncorrectable data checks to 
those already established as 
permanent data checks on the volume. 
If the total is 10 or fewer, perform 
INSPECT (as shown) for each track. If 
the total is more than 10, call your IBM 
service representative. 


if INSPECT executes and data was not 
preserved, restore data from a backup 
copy created before error occurred, 
and update as needed. 


| Note: Cylinder parameters cannot be entered exactly as specified, for example: CYLRANGE(X'cccc' +1, X'cecc'+8). The 
| calculation must be performed first, then enter the result. | 


| | VM users operating on volumes that are not attached to a userid should see 
| “Performing Media Maintenance in VM” on page 59. 
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3344 Condition 2 - Temporary Data Checks 


Device Support Your Response to Device 
Your Actions Facilities Actions Support Facilities Actions 


ANALYZE DRIVETEST NOSCAN Exercises hardware. If ANALYZE test If “Suspected Drive Problem” message, 
IF LASTCC < 8- detects hardware problem, issues call your IBM service representative for 


THEN - diagnostic message, “Suspected Drive a possible hardware problem. 


INSPECT TRACK(cccc hhhh) - Problem.” 
CHECK(1) - if data could not be preserved, you may 


lf ANALYZE does not detect hardware want to try INSPECT with NOPRESERVE. 
ASSIGN - 

problem, executes INSPECT. If data can 
PRESERVE 

be preserved, checks track surfaces. 

Skips defect. Assigns an alternate track 

if necessary. 


Restores data. 


| VM users operating on volumes that are not attached to a userid should see 
| “Performing Media Maintenance in VM” on page 59. 


3344 Condition 3 - Permanent Data Checks at 11 or More Tracks 
Facilities Actions Support Facilities Actions 
pNore Nome 


| VM users operating on volumes that are not attached to a userid should see 
| “Performing Media Maintenance in VM” on page 59. 
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Error Handling for 3350 


Special Instructions 


lf permanent or temporary data checks occur on three to ten tracks, examine bytes 
22 and 23 of the sense information for each track address. Sense information is 
provided in the DASD Data Transfer Summary report, as shown.in Figure 27. 


UNITADDRESS 0B54 DEVTYPE 3350 VOLUME ICOM15 
CPU B PHYSICAL ADDRESS 0B54 


FAILURE AT ADDRESS: CYLINDER 0003 HEAD 08 0 2 0 0 


FAILURE AT ADDRESS: CYLINDER 0018 HEAD 08 0 1 0 0 


LAST SENSE AT: 225/86 22:52:06:17 
Figure 27. Example of Locating 3350 Symptom Code 


Bytes 22 and 23 of the sense information (shaded area) are the symptom code. 
Codes 4940 and 4941 indicate errors in a home address or count area. If 4940 or 
4941 is indicated for three or more tracks, special treatment is needed, as described 
in the Condition 1 table. If you cannot copy your data, your IBM service 
representative might be able to help. 


If temporary data checks, which exceed acceptable limits for your processing 
complex, are reported in the Subsystem Exception report but cylinder and head 
numbers are not given in the DASD Data Transfer Summary report, you may use 
ANALYZE SCAN to assist in determining the location of data checks. 


A 3350 processing in 3330 emulation mode can be examined as a 3330, by following 
the 3330 guidelines. First, use the ANALYZE DRIVETEST function to determine if 
there is a suspected drive problem. If checking the 3350 as a 3330, defects are not 


skipped. However, you may want to perform error handling as it applies to a 3350 to 


obtain defect skipping. Follow these steps: 
1. Copy the 3330 volume(s) to another volume. 
2. Call your IBM service representative to put the volume in 3350 (native) mode. 


3. Use the Device Support Facilities INIT command with CHECK(3). The program 
surface-checks all disks of the 3350 head and disk assembly, and skips any 
defects. 


4. Have your IBM service representative put the volume back into emulation mode. 
5. Re-initialize the 3330 volumes, using the INIT command with VALIDATE. 


6. Restore the data. 
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3350 Condition 1 - Permanent or Temporary Data Checks with 4940/4941 
Symptom Code 


Applicable when at least three tracks show sense bytes 22 and 23 equal to 4940 or 
4941. 


Device Support Your Response to Device 
Your Actions Facilities Actions Support Facilities Actions 


Use appropriate utilities or program to 
copy data from volume temporarily to 
another volume. 


ANALYZE DRIVETEST NOSCAN 
IF LASTCC < 8 - 

THEN - 

INIT VALIDATE 


Exercises hardware. If ANALYZE test 
detects hardware problem, issues 

diagnostic message, “Suspected Drive 
Problem.” 


lf “Suspected Drive Problem” message, 
call your IBM service representative for 
a possible hardware problem. 


If INIT executed, restore data from 
temporary copy. 


lf no hardware problem detected, 
executes INIT. 


Rewrites home address and record 
zero of all tracks. 


VM users operating on volumes that are not attached to a userid should see 
“Performing Media Maintenance in VM” on page 59. 
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3350 Condition 2 - Permanent Data Checks at 1 to 10 Tracks 


Applicable when three or more tracks do not show 4940 or 4941 in sense bytes 22 
and 23. Follow the instructions for Condition 1 when the symptom code is 4940 or 
4941 at three or more tracks. 


Device Support Your Response to Device | 
Your Actions Facilities Actions Support Facilities Actions . 


important: Because the data check is 
permanent, it is likely that the command 
sequence shown below will fall into the 
INSPECT NOPRESERVE portion. Thus, 


your data will be lost and you will need 
to rely on other backup copies to 
recover. 


Use the following Device Support Facilities command sequences: 


ANALYZE DRIVETEST SCAN - Exercises hardware. Gives messages 
CYLRANGE(cccc-5, cccc + 5) with cylinders and head numbers for 
tracks, within the specified range, that 
have repeatable data checks. 


if “Suspected Drive Problem” message 
from ANALYZE, use appropriate utility 
or program to temporarily dump as 
much data as possible from al/ volumes 
of head disk assembly to another 
volume. Call your IBM service 
representative for a possible hardware 
problem. 


Add new tracks that ANALYZE reports 
as having uncorrectable data checks to 
those already established as 
permanent data checks on the volume. 
If the total is 10 or fewer, perform 
INSPECT (as shown) for each track. If 
the total is more than 10, call your IBM 
service representative. 


INSPECT TRACK(cccc hhhh) - 
CHECK(1) - 
ASSIGN - 
PRESERVE 

IF LASTCC = 8- 

THEN- 

INSPECT TRACK(cccc hhhh) - 
CHECK(1) - 
ASSIGN - 

NOPRESERVE 


If INSPECT executes and data was not 
preserved, restore data from a backup 
copy created before error occurred, 
and update as needed. 


Preserves data from track if readable. 
Checks track surface. Skips defects. 
Assigns alternate track if necessary. If 
data is preserved, restores data. 


Note: Cylinder parameters cannot be entered exactly as specified, for example: CYLRANGE(X'cccc' +1, X'cccc' +8). The 
calculation must be performed first, then enter the result. 


VM users operating on volumes that are not attached to a userid should see 
“Performing Media Maintenance in VM” on page 59. 


é 
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3350 Condition 3 - Temporary Data Checks at Known Tracks 


Applicable when three or more tracks do not show 4940 or 4941 in sense bytes 22 
and 23 (or when 1 or 2 tracks do show 4940 or 4941.) Follow instructions for 
Condition 1 when the symptom code is 4940 or 4941 at three or more tracks. 


Device Support Your Response to Device 
Your Actions Facilities Actions Support Facilities Actions 


ANALYZE DRIVETEST NOSCAN Exercises hardware. If ANALYZE test if “Suspected Drive Problem” message, 
IF LASTCC < 8- detects hardware problem, issues call your IBM service representative for 
THEN - diagnostic message, “Suspected Drive a possible hardware problem. 

INSPECT TRACK(cccc hhhh) - Problem.” 
CHECK(1) - 
ASSIGN - 
PRESERVE 


if data cannot be preserved, you may 


If no hardware problem detected, want to try INSPECT with NOPRESERVE. 
executes INSPECT. If data can be 

preserved, checks track surfaces. 

Skips defect. Assigns an alternate track 

if necessary. 


Restores data. 


| VM users operating on volumes that are not attached to a userid should see 
| “Performing Media Maintenance in VM” on page 59. 


3350 Condition 4 - Temporary Data Checks at Unknown Tracks 


Applicable when error locations (cylinder and head numbers) are unknown and you 
want to determine error locations. 


Device Support Your Response to Device 
Your Actions Facilities Actions Support Facilities Actions 


ANALYZE DRIVETEST SCAN Exercises hardware. If ANALYZE test if “Suspected Drive Problem” message, 
detects hardware problem, issues call your IBM service representative for 


diagnostic message, “Suspected Drive a possible hardware problem. 


Problem. Use INSPECT as in Condition 3. 


Gives messages with cylinders and 
head numbers for tracks that have 
repeatable data checks. 


| VM users operating on volumes that are not attached to a userid should see 
| “Performing Media Maintenance in VM” on page 59. 


3350 Condition 5 - Permanent Data Checks at 11 or More Tracks 


Device Support Your Response to Device 
Your Actions Facilities Actions Support Facilities Actions 


| VM users operating on volumes that are not attached to a userid should see 
| “Performing Media Maintenance in VM” on page 59. 
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Error Handling for 3370 


Special Instructions 
For temporary errors on 3370s not attached directly to a 4321 or 4331, you should 
take action whenever the error rate threshold is exceeded. Examine the DASD Data 
Transfer Summary report to help determine which blocks to check when the 
temporary threshold is exceeded. It is important to understand that the blocks 
indicated show the location at which an operation was in progress when the data 
error rate threshold for the volume was exceeded. This is not necessarily the only 
failing address; errors might have accumulated at other blocks on the volume until 
the error at this failing address caused the threshold to overflow. 


Whenever the symptom code (last 4 digits of the sense information on the DASD 
Data Transfer Summary report) is 4940 at more than three tracks, you should 
contact your hardware service representative. 


3370 Condition 1 - Permanent Data Checks at 1 to 10 Blocks 


: Device Support Your Response to Device 
Your Actions Facilities Actions Support Facilities Actions 


Important: Because the data check is 
permanent, it is likely that the command 

- sequence shown below will fall into the 
INSPECT NOPRESERVE portion. Thus, 
your data will be lost and you will need 
to rely on other backup copies to 
recover. 


_ Use the following Device Support Facilities command sequence: | 


ANALYZE DRIVETEST NOSCAN Exercises hardware. If ANALYZE test If “Suspected Drive Problem” message, 


IF LASTCC < 8- detects hardware problem, issues use appropriate utility or program to 
THEN DO diagnostic message, “Suspected Drive copy as much data as possible from all 
INSPECT BLOCK(rbn) - Problem.” volumes of head disk assembly to other 
CHECK(1) - if ANALYZE test does not detect volumes. Call your IBM service 
ASSIGN - representative for a possible hardware 
hardware problem, executes INSPECT. 
PRESERVE : problem. 
IF LASTCG = 8- Preserves data from block if readable. 
THEN - Checks biock surface. If defects if INSPECT executes and data was not 
confirmed, assigns alternate block. If preserved, restore data from backup 
INSPECT BLOCK(rbn) - 
CHECK(1) - data was preserved, restores data. copy created before error occurred, 
ASSIGN - and update as needed. 
NOPRESERVE 


END 


VM users operating on volumes that are not attached to a userid should see 
“Performing Media Maintenance in VM” on page 59. 
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3370 Condition 2 - Temporary Data Checks at Known Blocks 


Applicable when the 3370 is attached directly to a 4321 or 4331 and temporary data 
checks occur, or when the 3370 is not attached directly to a 4321 or 4331 and the 
data error rate threshold is exceeded. 


Device Support Your Response to Device 
Your Actions Facilities Actions Support Facilities Actions 


ANALYZE DRIVETEST NOSCAN Exercises hardware. If ANALYZE test If “Suspected Drive Problem” message, 
IF LASTCC < 8- detects hardware problem, issues call your IBM service representative for 
THEN - diagnostic message, “Suspected Drive a possible hardware problem. 

INSPECT BLOCK(rbn) - Problem.” 


CHECK(1) - lf data could not be preserved, you may 
ASSIGN - If ANALYZE test does not detect want to try INSPECT with NOPRESERVE. 
PRESERVE hardware problem, executes INSPECT. 

If data can be preserved, checks block 

surface. If defect confirmed, assigns 

alternate block. Restores data. 


| VM users operating on volumes that are not attached to a userid should see 
| “Performing Media Maintenance in VM” on page 59. 


3370 Condition 3 - Temporary Data Checks at Unknown Blocks 


Applicable when the 3370 is attached directly to a 4321 or 4331 and temporary data 
checks occur. 


Device Support Your Response to Device 
Your Actions Facilities Actions Support Facilities Actions 


ANALYZE DRIVETEST SCAN. Exercises hardware. If ANALYZE test If “Suspected Drive Problem” message, 
detects hardware problem, issues call your IBM service representative for 
diagnostic message, “Suspected Drive a possible hardware problem. 
Problem.” 


Use INSPECT as in Condition 2 if you 
Gives messages with relative block want to assign alternate tracks. 
numbers for blocks that have 
repeatable data checks. If temporary 
checks are repeatable, gives message 
with cylinder and head number of 
blocks with defects. 


| VM users operating on volumes that are not attached to a userid should see 
| “Performing Media Maintenance in VM” on page 59. 


3370 Condition 4 - Permanent Data Checks at 11 or More Blocks 


Device Support Facilities Actions Your Response to Device Support 
Facilities Actions 


| VM users operating on volumes that are not attached to a userid should see 
| “Performing Media Maintenance in VM” on page 59. 
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Error Handling for 3375 and 3380 


Special Instructions 


Examine the DASD Data Transfer Summary report to determine which tracks to 
check when the temporary threshold is exceeded. For temporary errors, you should 
take action whenever the error rate threshold is exceeded. The value (number of 
times threshold was exceeded) is listed in the TEMPORARY column beside the 
address that failed when the threshold was reached. 


Temporary errors can be recovered with or without offset invoked. (When offset is 
invoked, the error is successfully recovered by retrying the operation with the head 
in an offset position on the track.) Temporary errors with and without offset invoked 
are treated differently, as described in the condition tables. The following material 
describes which tracks to treat under each of these circumstances. 


With Offset Invoked 


If temporary errors are retried with offset invoked, you might need to rewrite the 
home address. Errors recovered by retry with offset invoked are listed for the 3375 
and 3380 in the DASD Data Transfer Summary report under the TEMPORARY 
OFFSET INVK YES column, as shown in Figure 28. In the illustration, offset is 
invoked on many scattered tracks on the volume. If offset is invoked on three or 
more tracks, special treatment is needed, as described in “3375 and 3380 Condition 
1 - Temporary Data Checks with Offset at 3 or More Tracks” on page 99. For this 
error type, each occurrence at each location is included in the count, which is 8 in 
the example shown here. 


SENSE COUNTS 


) THRESHOLD 
PERM NO YES LOGGING 
KEREERERRREREKRRERREEEERRERERERERERKRERERREERERRRERERERERRRERRRRRRERERERERRRERRERRRERRER 
UNITADDRESS 07C3 DEVTYPE 3380 VOLUME PSGO91 
CPUC PHYSICAL ADDRESS XX-10-00 


FAILURE AT ADDRESS: CYLINDER 0355 HEAD 00 0 0 
00001000 O08FZ205B 01220000 08951000 0600007A 00010000 
LAST SENSE AT: 225/86 23:41:29:82 


FAILURE AT ADDRESS: CYLINDER 0355 HEAD Q1 0 0 
00001000 OO8FZ05B 01220000 08951000 0600007A 0001C000 
LAST SENSE AT: 224/86 14:23:39:42 


FAILURE AT ADDRESS: CYLINDER 0770 HEAD 02 0 0 
90001000 0002315B 01950001 08951000 O6000F5A 00400000 
LAST SENSE AT: 224/86 16:13:20:39 


FAILURE AT ADDRESS: CYLINDER 0328 HEAD 05 . 0 0 
00001000 0048125B 01480002 195C1000 9600072A 00100000 
LAST SENSE AT: 225/86 22:34:11:07 


Figure 28. Example of DASD Data Transfer Summary Report, with Offset Invoked | 
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Without Offset Invoked 


The threshold for temporary data errors represents an accumulation of temporary 
error counts for a volume. A value other than zero in the TEMPORARY OFFSET 
INVK NO column identifies a track address where an operation was in progress 
when error logging was started. The value shown in the THRESHOLD LOGGING 
column is the total number of temporary errors logged for that cylinder/head 
address during the error reporting interval. 


To determine which tracks to check for temporary errors without offset invoked, see 
the examples in Figure 29 on page 98. Add the value in the TEMPORARY OFFSET 
INVK NO column to the value in the THRESHOLD LOGGING column. The tracks with 
the highest total values are having the greatest impact on your system. Media 
maintenance should be performed on a track that has a total of two or more, unless 
a hardware failure condition exists. A track with a total of one can be ignored. 


Examine the three examples in Figure 29 on page 98. In each case, the volume has 
exceeded the threshold two times, but the error distribution is different in each 
example. In Example 1, the tracks at cylinder 0328, head 02 and cylinder 0532, head 
07 can be ignored. In contrast, all four tracks reported in Example 2 and all three 
tracks reported in Example 3, should be considered for media maintenance. 
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SENSE COUNTS 
TEMPORARY 


PER 

HHH KKK IKIK RIKKI ERKERESR IEA EIKIIK IIIA IAI IKI EAI IRIE 

UNITADDRESS 07C3 DEVTYPE 3380 VOLUME PSGQ91 EXAMPLE 1 
CPU C PHYSICAL ADDRESS XX-10-00 7 


FAILURE AT ADDRESS: CYLINDER 0655 HEAD 00 0 
00001000 O08F2053 01220000 08951000 0600007A 0001C000 
LAST SENSE AT: 224/86 23:41:29:82 


FAILURE AT ADDRESS: CYLINDER 0770 HEAD 01 0 
00001000 00023153 01950001 08951000 O6000F5A 00400000 
LAST SENSE AT: 224/86 16:13:20:39 


17 


FAILURE AT ADDRESS: CYLINDER 0328 HEAD 02 0 
00001000 00481253 01480002 195C1000 0600072A 00100000 
LAST SENSE AT: 224/86 22:34:11:07 


FAILURE AT ADDRESS: CYLINDER 0532 HEAD 07 0 
00001000 00142753 00A70007 07811000 06000284 96000000 
LAST SENSE AT: 224/86 23:14:05:02 


KRKKEKRERRERERERRERERRERERERERRRERRERERERRERERRRERRRERRERERERERERRRERRERERERRERRRERERERERERERE 


UNITADDRESS 07C3  DEVTYPE 3380 VOLUME PSG091 EXAMPLE 2 
CPU C PHYSICAL ADDRESS XX-10-00 


FAILURE AT ADDRESS: CYLINDER 0655 HEAD 00 Q 
00001000 008F2053 01220000 08951000 0600007A 0001C000 
LAST SENSE AT: 225/86 23:41:29:82 


FAILURE AT ADDRESS: CYLINDER 0770 HEAD 01 0 
00001000 00023153 01950001 08951000 O6000F5A 00400000 
LAST SENSE AT: 225/86 16:13:20:39 


16 


FAILURE AT ADDRESS: CYLINDER 0328 HEAD 02 0 | 
00001000 00481253 01480002 195C1000 0600072A 00100000 Ree 
LAST SENSE AT: 225/86 22:34:11:07 


FAILURE AT ADDRESS: CYLINDER 0532 HEAD 07 0 
00001000 00142753 O0A70007 07811000 06000284 06000000 
LAST SENSE AT: 225/86 23:14:05:02 


KREKRKEKRERKREREKRRERRERRERRERERERREREREERERERER RR RERERERREREERERRERERERRERERERRERRERRERRRRREREKERERERER 


UNITADDRESS 07C3  DEVTYPE 3380 VOLUME PSG091 EXAMPLE 3 
CPU C PHYSICAL ADDRESS XX-10-00 


FAILURE AT ADDRESS: CYLINDER 0655 HEAD 00 0 y 
00001000 008F2053 01220000 08951000 0600007A 0001C000 = 3 : 
LAST SENSE AT: 226/86 23:41:29:82 \ 


FAILURE AT ADDRESS: CYLINDER 0770 HEAD 01 | 0 
00001000 00023153 01950001 08951000 O6000F5A 00400000 
LAST SENSE AT: 226/86 16:13:20:39 


16 


FAILURE AT ADDRESS: CYLINDER 0328 HEAD 02 0 
00001000 00481253 01480002 195C1000 0600072A 00100000 
LAST SENSE AT: 226/86 22:34:11:07 


22 


Figure 29. Example of DASD Data Transfer Summary Report, without Offset Invoked 
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3375 and 3380 Condition 1 - Temporary Data Checks with Offset at 3 or More 
Tracks 


Applicable when offset is invoked at three or more tracks. Refer to the special 
instructions at the beginning of this section. 


Device Support Your Response to Device 
Your Actions Facilities Actions Support Facilities Actions 
= tort oo palaebal i 


copy data from volume temporarily to 

another volume. 

Use the following Device Support Facilities command sequence: 

ANALYZE DRIVETEST NOSCAN Exercises hardware. If ANALYZE test if “Suspected Drive Problem” message, 

IF LASTCC < 8- detects hardware problem, issues call your IBM service representative for 


THEN - diagnostic message, “Suspected Drive a possible hardware problem. 
INIT VALIDATE Problem.” 


lf ANALYZE test does not detect 
hardware problem, executes INIT. 


lf INIT executes, restore data from 
temporary copy. 


Rewrites home address and record 
zero on each track of volume. 


VM users operating on volumes that are not attached to a userid should see 
“Performing Media Maintenance in VM” on page 59. 


If the volume requiring media maintenance is part of a dual copy pair, the DIRECTIO 
parameter allows you to specify the proper volume of the pair. See “Performing 
Media Maintenance on Dual Copy Volumes” on page 59. 
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3375 and 3380 Condition 2 - Permanent Data Checks at 1 to 10 Tracks 


Device Support Your Response to Device 
Your Actions . Facilities Actions - Support Facilities Actions 


important: Because the data check is 
- permanent, it is likely that the command © 
sequence shown below will fall into the 
INSPECT NOPRESERVE portion. Thus, : 
your data will be lost and you will need 
to rely on other backup copies to 
recover. 


Use the following Device Support Facilities command sequences: 


ANALYZE DRIVETEST. SCAN - Exercises hardware. Gives messages If “Suspected Drive Problem” message 
CYLRANGE(cccc-5, cccc + 5) with cylinders and head numbers for from ANALYZE, use appropriate utility 

tracks, within the specified range, that or program to temporarily dump as 

have repeatable data checks. much data as possible from a/! volumes 
of head disk assembly to another 
volume. Call your IBM service 
representative for a possible hardware 
probiem. 


Add new tracks that ANALYZE reports 
as having uncorrectable data checks to 
those already established as 
permanent data checks on the volume. 
If the total is 10 or fewer, perform 
INSPECT (as shown) for each track. If 
the total is more than 10, call your IBM 
service representative. 


INSPECT TRACK(cccc hhhh) - Preserves data from track if readable. lf INSPECT executes and data was not — 
CHECK(1) - Checks track surface. Skips defects. preserved, restore data from a backup 
ASSIGN - Assigns alternate track if necessary. If copy created before the media problem 
PRESERVE data is preserved, restores data. occurred, and update as needed. 

IF LASTCC = 8- 

THEN - 

INSPECT TRACK(cccc hhhh) - 
CHECK(1) - 
ASSIGN - 
NOPRESERVE 


Note: Cylinder parameters cannot be entered exactly as specified, for example: CYLRANGE(X'cccc' + 1, X'cccc' +8). The 
calculation must be performed first, then enter the result. 


VM users operating on volumes that are not attached to a userid should see 
“Performing Media Maintenance in VM” on page 59. 


If the volume requiring media maintenance is part of a dual copy pair, the DIRECTIO 


parameter allows you to specify the proper volume of the pair. See “Performing 
Media Maintenance on Dual Copy Volumes” on page 59. 
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3375 and 3380 Condition 3 - Temporary Data Checks 


Applicable for temporary checks. However, if offset has been invoked at three or 
more tracks, follow the instructions for Condition 1. 


| Device Support Your Response to Device 
Your Actions Facilities Actions Support Facilities Actions 


if data checks show a pattern (for example, are clustered around a single head), use the following Device Support Facilities 
command sequences: 


ANALYZE DRIVETEST SCAN - 
HEADRANGE(hhhh, hhhh) 


Exercises hardware. Gives messages 
with cylinders and head numbers for 

tracks, within the specified range, that 
have repeatable data checks. 


if “Suspected Drive Problem” message 
from ANALYZE, use appropriate utility 
or program to temporarily dump as 
much data as possible from all volumes 
of head disk assembly to another 
volume. Call your IBM service 
representative for a possible hardware 
problem. 


Add the new tracks that ANALYZE 
reports as having data checks to those 
already established for this head; 

proceed to the next step for each track. 


Preserves data from track if readable. 
CHECK(1) - Checks track surface. Skips defects. 
ASSIGN - Assigns alternate track if necessary. 
PRESERVE Restores data. 


if data checks do not follow a pattern, use the following Device Support Facilities command sequence: 


ANALYZE DRIVETEST NOSCAN Exercises hardware. if “Suspected Drive Problem” message 


INSPECT TRACK(cccc hhhh) - lf data could not be preserved, you may 


want to try INSPECT with NOPRESERVE. 


IF LASTCC < 8- : from ANALYZE, use appropriate utility 
Preserves data from track if readable. ; 
uaa Checks track surface. Skips defects oF Piogtaln:totemporably cumpias 
INSPECT TRACK(cccc hhhh) - : eps : much data as possible from al/ volumes 
Assigns alternate track if necessary. : 
CHECK(1) - RAcAraScAata of head disk assembly to another 
ASSIGN - ; volume. Call your IBM service 


PRESERVE representative for a possible hardware 


problem. 


if data could not be preserved, you may 
want to try INSPECT with NOPRESERVE. 


Note: The HEADRANGE parameter should only specify the same value for the beginning and end of the range, for example: 
HEADRANGE (X'03', X'03'). 


VM users operating on volumes that are not attached to a userid should see 
“Performing Media Maintenance in VM” on page 59. 


If the volume requiring media maintenance is part of a dual copy pair, the DIRECTIO 
parameter allows you to specify the proper volume of the pair. See “Performing 
Media Maintenance on Dual Copy Volumes” on page 59. — | 


3375 and 3380 Condition 4 - Permanent Data Checks at 11 or More Tracks 


Device Support Your Response to Device 
Facilities Actions Support Facilities Actions 


VM users operating on volumes that are not attached to a userid should see 
“Performing Media Maintenance in VM” on page 59. 


If the volume requiring media maintenance is part of a dual copy pair, the DIRECTIO 


parameter allows you to specify the proper volume of the pair. See “Performing 
Media Maintenance on Dual Copy Volumes” on page 59. 
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Error Handling for 3390 


Special Instructions 
Each media SIM listed in the Service Information Messages report shows the failing 
track address and appropriate media maintenance procedure number necessary to 
correct the media problem. If the SIM is listed as a SERVICE ALERT on the Service 
Information Messages report, contact your IBM service representative. This section 
discusses error handling guidelines for the MEDIA ALERT only. The following 
example shows a MEDIA ALERT listed on the Service Information Messages report. 


093/89 16:31:23:38 093/89 16:31:23:38 
MEDIA ALERT 3390-02 S/N 0113-14172 REFCODE 4380-E081-2585 ID=08 
TEMPORARY DATA CHECK(S) ON SSID 0040, VOLSER JES8E5 DEV Q8E5, 59 


PHYSICAL DEVICE 25, CYLINDER 0669 TRACK 03 
REFERENCE MEDIA MAINTENANCE PROCEDURE 5 


When viewing the SIM, check the media maintenance procedure number and find 
the corresponding media maintenance procedure that is described in the following 
tables. This will tell you which Device Support Facilities action to take. The media 
maintenance procedures are numbered 1, 3, 5, 7 and 9. The 3390 device is capable 
of operating in 3390 mode or 3380 track compatibility mode. Media maintenance is 
performed in the same manner, regardless of the mode of the device. For more 
information on the modes a 3390 can operate in, see: 


Using the IBM 3390 Direct Access Storage in an MVS Environment 
Using the IBM 3390 Direct Access Storage in a VM Environment 
Using the IBM 3390 Direct Access Storage in a VSE Environment 
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3390 Media Maintenance Procedure 1 
Conditions indicate that action is required on all tracks of the volume 


Device Support . Your Response to Device 
Your Actions Facilities Actions Support Facilities Actions 


Use appropriate utilities or program to 
copy data from volume temporarily to 
another volume. 


Use the following Device Support 
Facilities Command: 
REVALIDATE 


Exercises hardware and performs all 
functions necessary to ensure that the 
device is now operating properly. 


This procedure should not be used with 
the “system-assisted facility” for 
performing media maintenance since 
the volume must be off-line to the 
operating system, and all data on the 
volume will be destroyed. 


Run minimal INIT to create volume 
label, volume table of contents (VTOC), 
etc. Restore data from the temporary 

copy. 


VM users operating on volumes that are not attached to a userid should see 
“Performing Media Maintenance in VM” on page 59. 


The REVALIDATE command does not support dual copy volumes. A dual copy 
volume must be put into simplex state before using REVALIDATE. 
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3390 


3390 Media Maintenance Procedure 3 | - — 
Comprehensive analysis is required on a specific track. The error might be 
temporary or permanent. In addition, tracks in the vicinity of the specific track 
should be checked, and a comprehensive test done for them if necessary. 


This procedure should not be used with the automated capability for performing 
media maintenance since data on the track will be destroyed. 


ANALYZE DDNAME(dname) + 
DRIVETEST NOSCAN 

IF LASTCC <8 + 

THEN + 


DO 
INSPECT SKIP TRACK(X'cccc' X'hhhh') + 

| CHECK (2) ASSIGN PRESERVE + 
DDNAME(dname) NOVERIFY 

If LASTCC = 8 + 

THEN + | 

INSPECT SKIP TRACK(X'cccc! X'hhhh') + 
CHECK (2) ASSIGN PRESERVE + 
DDNAME(dname) NOVERIFY 


INSPECT NOSKIP + 
HEADRANGE (X'hhhh', X'hhhh') + 
CYLRANGE(X'cccc' + 1, X'cocc' +8) 
CHECK (2) ASSIGN PRESERVE + 
DDNAME(dname) NOVERIFY 


INSPECT NOSKIP. + 
HEADRANGE (X'hhhh', X'hhhh') + 
CYLRANGE(X'cccc'- 8, X'cccc'- 1) + 
CHECK (2) ASSIGN PRESERVE + 
DDNAME(dname) NOVERIFY 

END 


Device Support 
Facilities Actions 


Exercises hardware. 


if Analyze test detects hardware problem, 


issues diagnostic message “Suspected 
Drive Problem” 


lf ANALYZE test does not detect 
hardware problem, executes INSPECT. 


Concurrently preserves data from track | 


Checks track surface. Skips defects. 
Assigns alternate track if necessary. If 
data is preserved, restores data. 


Concurrently preserves data from each 
track as it is processed. Performs a 
primary surface check on all tracks for 
head hhhh for all cylinders specified. If 
primary surface checking suspects a 


problem, it checks the tracks surface and 
assigns defects. Assigns alternate track 


if necessary. 


Restores data. 


Concurrently preserves data from each 
track as it is processed. Performs a 
primary surface check on all tracks for 
head hhhh for all cylinders specified. If 
primary surface checking suspects a 


problem, it checks the track’s surface and 
assigns defects. Assigns alternate track 


if necessary. 


Restores data. 


Your Response to Device | 
Support Facilities Actions | 


If “Suspected Drive Problem” 
message from ANALYZE, call your 
service representative for a possible 
hardware problem. | 


If INSPECT executes and data was 
not preserved, restore data from 
backup copy created before the 
media problem occurred, and 
update as needed. 


lf data could not be preserved, you 
may want to try INSPECT with 
NOPRESERVE. 


If data could not be preserved, you 
may want to try INSPECT with 
NOPRESERVE. 


Note: The HEADRANGE parameter should only specify the same value for the beginning and end of the range, for example: 
HEADRANGE (X'03', X'03'). Also, cylinder parameters cannot be entered exactly as specified, for example: 
CYLRANGE(X'cccc' +1, X'ccecc' +8). The calculation must be performed first, then enter the result. 


VM users operating on volumes that are not attached to a userid should see 


“Performing Media Maintenance in VM” on page 59. 


If the volume requiring media maintenance is part of a dual copy pair, the DIRECTIO 
parameter allows you to specify the proper volume of the pair. See “Performing 
Media Maintenance on Dual Copy Volumes” on page 59. 
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3390 


3390 Media Maintenance Procedure 5 


Tracks in the vicinity of the specific track should be checked, and a comprehensive 
test done for them if necessary. 


This procedure should not be used with the “system-assisted facility” for 
performing media maintenance since data on the track will be destroyed. 


Device Support Your Response to Device 
Your Actions Facilities Actions Support Facilities Actions 


ANALYZE DRIVETEST NOSCAN Exercises hardware. if “Suspected Drive Problem” message 
IF LASTCC <8 + from ANALYZE, call your service 
THEN + if Analyze test detects hardware representative for a possible hardware 
problem, issues diagnostic message problem. 
“Suspected Drive Problem” 


if ANALYZE test does not detect 
hardware problem, executes INSPECT. 


INSPECT NOSKIP + Concurrently preserves data from each If data could not be preserved, you may 


HEADRANGE (X'hhhh', X'hhhh') + | track as it is processed. Performs a want to try INSPECT with NOPRESERVE. 
CYLRANGE(X'cccc'-8, X'cccc' + 8) primary surface check on all tracks for 
CHECK (2) + head hhhh for all cylinders specified. If 
ASSIGN + primary surface checking suspects a 
PRESERVE problem, it checks the track’s surface 
and assigns defects. Assigns alternate 
track if necessary. 


Restores data. 


Note: The HEADRANGE parameter should only specify the same value for the beginning and end of the range, for example: 
HEADRANGE (X'03', X'03'). Also, cylinder parameters cannot be entered exactly as specified, for example: 
CYLRANGE(X'cccc' +1, X'cccc'+8). The calculation must be performed first, then enter the result. 


VM users operating on volumes that are not attached to a userid should see 
“Performing Media Maintenance in VM” on page 59. 


If the volume requiring media maintenance is part of a dual copy pair, the DIRECTIO 


parameter allows you to specify the proper volume of the pair. See “Performing 
Media Maintenance on Dual Copy Volumes” on page 59. 
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3390 Media Maintenance Procedure 7 
| Comprehensive analysis is required on a specific track. The error might be 
temporary or permanent. 


This procedure should not be used with the “system-assisted facility” for 
performing media maintenance since data on the track will be destroyed. 


| | Device Support Your Response to Device 
Your Actions. | Facilities Actions Support Facilities Actions. 


important: If the data check is 
‘permanent, it is likely that the command 
- sequence shown below will fall into the 
INSPECT NOPRESERVE portion. Thus, 
~ your data will be lost and you will need © 
to rely on other backup copies to 
_ recover. Use the following Device 
Support Facilities command sequence: 


ANALYZE DRIVETEST NOSCAN Exercises hardware. if “Suspected Drive Problem” message 

IF LASTCC <8 + from ANALYZE, call your service 

THEN + if Analyze test detects hardware representative for a possible hardware 
problem, issues diagnostic message problem. 


“Suspected Drive Problem” 


lf ANALYZE test does not detect 
hardware problem, executes INSPECT. 


INSPECT SKIP TRACK(X'cccc’ X'hhhh') Concurrently preserves data from the lf INSPECT executes and data was not 


CHECK(2) - track as it is processed. Checks track preserved, restore data from backup 
ASSIGN - surface. Skips defects. Assigns copy created before media problem 
PRESERVE alternate track if necessary. Restores occurred, and update as needed. 

If LASTCC = 8- data. | 

THEN - | 

INSPECT SKIP TRACK(X'cccc' X'hhhh') 
CHECK(2) + 
ASSIGN + 


NOPRESERVE 


VM users operating on volumes that are not attached to a userid should see 
“Performing Media Maintenance in VM” on page 59. 


If the volume requiring media maintenance is part of a dual copy pair, the DIRECTIO 
parameter allows you to specify the proper volume of the pair. See “Performing 
Media Maintenance on Dual Copy Volumes” on page 59. 
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3390 Media Maintenance Procedure 9 
The specific track is checked, and a comprehensive test done if necessary. 


Device Support Your Response to Device 
Your Actions Facilities Actions Support Facilities Actions 


ANALYZE DRIVETEST NOSCAN Exercises hardware. if “Suspected Drive Problem” message 
IF LASTCC < 8 + from ANALYZE, call your service 


THEN + dif eat Hig og ne eorebien paccant 7 representative for a possible hardware 
P : g g problem. 


“Suspected Drive Problem” 


lf ANALYZE test does not detect 
hardware problem, executes INSPECT. 


INSPECT NOSKIP + Concurrently preserves data from each lf data could not be preserved, you may 
TRACK(X'cccc' X'hhhh') + track as it is processed. Performs a want to try INSPECT with NOPRESERVE. 
CHECK(2) + primary surface check on all tracks for 
ASSIGN + head hhhh for all cylinders specified. If 
PRESERVE primary surface checking suspects a 

problem, it checks the track’s surface 
and assigns defects. Assigns alternate 
track if necessary. 


Restores data. 


| VM users operating on volumes that are not attached to a userid should see 
| “Performing Media Maintenance in VM” on page 59. 


| | If the volume requiring media maintenance is part of a dual copy pair, the DIRECTIO 
| parameter allows you to specify the proper volume of the pair. See “Performing 
| Media Maintenance on Dual Copy Volumes” on page 59. 
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Error Handling for 9332 


The 9332 contains sophisticated error detection, thresholding, isolation, and \ 


reporting that helps to reduce your involvement in maintenance. Media and 
hardware errors are detected and tracked. Faulty field replaceable units (FRUs) 
requiring replacement are reported to you using the System Reference Code. The 
9332 aids maintenance by monitoring media and hardware errors and reporting 
them when the respective threshold is exceeded. The error thresholds cannot be 
updated by the user. 
As a result of the error handling capabilities of the 9332, you need only be 
concerned with the following conditions: 
e Field replaceable unit (FRU) replacement 
e Alternate block assignment 
e File backup 
9332 Condition 1 - Field Replaceable Unit (FRU) Replacement 
9332 Actions Your Response to 9332 Actions : 
A Field Replaceable Unit (FRU) needs to be Call the IBM service representative, and 
replaced. You receive System Reference Code provide the System Reference Code (1). 
Condition 1. 
VM users operating on volumes that are not attached to a userid should see 
“Performing Media Maintenance in VM” on page 59. 
9332 Condition 2 - Alternate Block Assignment a 
9332 Actions Your Response to 9332 Actions 
The 9332 determines that an alternate block Use the following Device Support Facilities 
should be assigned because of a media defect. command sequence to assign an alternate 
Usually the data is readable because the 9332 block: 
detects defect trends and recommends INSPECT BLCOCK(rbn) - 
alternate block assignment before an 
unrecoverable data error occurs. pote ari 
" ASSIGN - 
PRESERVE 
lf PRESERVE fails, you can retry this step with 
NOPRESERVE. You will then need to restore 
your data from a backup copy. 
If the alternate block assignment fails, proceed 
as in Condition 3. 
VM users operating on volumes that are not attached to a userid should see 
“Performing Media Maintenance in VM” on page 59. 
% 
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9332 Condition 3 - File Backup 


| 9332 Actions Your Response to 9332 Actions 


When a condition exists that threatens data Use the appropriate system utility to backup the 
(that is, an alternate block cannot be assigned), file contents, and contact your IBM service 

this action is required. This recommendation representative. 
means that all nearby alternates have been 
used. This does not imply that the assignment 
of other alternate blocks will be unsuccessful. 


VM users operating on volumes that are not attached to a userid should see 
“Performing Media Maintenance in VM” on page 59. 
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Error Handling for 9335 


Special Instructions 
The 9335 counts the number of times that recoverable data checks occur for each 
device. When a predetermined number is exceeded, error information for this 
occurrence and the next seven occurrences is logged in EREP. On the eighth 
recoverable error, the EREP Informational Messages Report contains the message: 


THRESHOLD LOGGING COMPLETE FOR DATA CHECKS 


Recoverable errors that occur when the threshold has not been exceeded will not be 
reported to the system. 


The 9335 has a Sector Read Retry counter. During internal recovery of data checks, 
a count is kept of the number of retries performed. If a predetermined number is 
exceeded, the EREP Informational Messages Report includes the message: 


SECTOR RETRY THRESHOLD EXCEEDED AT BLOCK (RBN) 


The “Threshold Logging” message takes precedence over the “Sector Retry 
Exceeded” message. — | 


9335 Condition 1 - Temporary Data Checks on Multiple Devices 
Occurs in connection with the message THRESHOLD LOGGING COMPLETE FOR 


DATA.CHECKS appearing for more than one device attached to the same Model 
A014. 


Device Support Your Response to Device 
Your Actions Facilities Actions Support Facilities Actions . 


VM users operating on volumes that are not attached to a userid should see 
“Performing Media Maintenance in VM” on page 59. 
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9335 Condition 2 - Permanent Data Checks at 1 or 2 Blocks 


Device Support Your Response to Device 
Your Actions Facilities Actions Support Facilities Actions 


important: Because the data check is 
permanent, it is likely that the command 
sequence shown below will fall into the 
INSPECT NOPRESERVE portion. Thus, 
your data will be lost and you will need 
to rely on other backup copies to 
recover. 


Use the following Device Support Facilities command sequence for each block: | 


ANALYZE NOSCAN If ANALYZE does not detect a hardware lf ANALYZE indicates that a possible 


IF LASTCC < 8- problem, INSPECT is executed. hardware problem exists, call for a 
THEN - Preserves data from block if readable. service representative. 
INSPECT BLOCK(rbn) - Checks block surface. Assigns if the data on the failing block was not 
CHECK(3) - alternate block if necessary. If data was dies 
preserved, it will be necessary to 
ASSIGN - preserved, restores data. = etore 1 POTN a BECKUb GO 
PRESERVE ids Se 
IF LASTCC = 8- if INSPECT ASSIGN fails because there 
THEN - are no spare alternate blocks left for 
INSPECT BLOCK(rbn) - reallocation, or there is a Missing 
CHECK(3) Interrupt Handler timeout, run 
ASSIGN diagnostic test 10. If the diagnostic test 


NOPRESERVE fails, call your IBM service 
representative; otherwise, proceed as 


in Condition 7. 


VM users operating on volumes that are not attached to a userid should see 
“Performing Media Maintenance in VM” on page 59. 


9335 Condition 3 - Permanent Data Check at 3 or More Blocks 


Device Support Your Response to Device 
Your Actions Facilities Actions Support Facilities Actions 


Call your IBM service representative, 
and attempt to copy all data to another 
volume. Note that data from both 
devices of the model B01 should be 
backed up if possible. 


VM users operating on volumes that are not attached to a userid should see 
“Performing Media Maintenance in VM” on page 59. 
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9335 Condition 4 - Temporary Data Checks at 1 to 3 Blocks 


Occurs in connection with “Threshold Exceeded” message for one device. 


Your Actions Device Support Facilities Actions Your Response to Device Support 
| . Facilities Actions 


ANALYZE NOSCAN lf ANALYZE does not detect a hardware if ANALYZE indicates that a possible 


IF LASTCC < 8- problem, INSPECT is executed. hardware problem exists, call for a IBM 

THEN - Preserves data from block if readabie. service representative. 

INSPECT BLOCK(rbn) - Checks block surface. Assigns if INSPECT ASSIGN fails because there 
CHECK(3) - alternate block if necessary. 
ASSIGN - are no spare alternate blocks left for 
PRESERVE Restores data. reallocation or there is a Missing 


Interrupt Handler timeout, run 
diagnostic test 10. If the diagnostic test 
fails, call IBM service representative; 

otherwise, proceed as in Condition 7. 


| VM users operating on volumes that are not attached to a userid should see 
| “Performing Media Maintenance in VM” on page 59. 


9335 Condition 5 - Temporary Data Check at More Than 3 Blocks 


Occurs in connection with “Threshold Exceeded” message for one device. 


Device Support Your Response to Device 
Your Actions Facilities Actions | Support Facilities Actions 


Call your IBM service representative, None 
and attempt to copy all data to another 
volume. Note that data from both 
devices of the model B01 should be 
backed up if possible. 


| VM users operating on volumes that are not attached to a userid should see 
| “Performing Media Maintenance in VM” on page 59. 
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9335 Condition 6 - Sector Read Retry Exceeded 


Device Support Your Response to Device 
Your Actions Facilities Actions Support Facilities Actions 


INSPECT BLOCK(rbn) - Preserves data from block. Checks If INSPECT fails because there are no 
CHECK(3) - surface of block, flags defects, and spare alternate blocks left for 
ASSIGN - assigns alternate block. Restores data. reallocation or there is a Missing 
PRESERVE Interrupt Handler timeout, run 


diagnostic test 10. If the diagnostic test 
fails, call your IBM service 
representative; otherwise, execute the 
Device Support Facilities RECLAIM 
function. If INSPECT does not execute 
successfully for any other reason, call 
your IBM service representative. 


| VM users operating on volumes that are not attached to a userid should see 
| “Performing Media Maintenance in VM” on page 59. 


9335 Condition 7 - Alternate Block Assignment Fails 


Occurs in connection with the message THRESHOLD LOGGING COMPLETE FOR 
DATA CHECKS appearing for more than one device attached to the same Model 
A01. 


Device Support Your Response to Device 
Your Actions Facilities Actions Support Facilities Actions 


INIT CHECK(3) - Reclaims all primary blocks that are not Restore data. 
RECLAIM experiencing data checks. 


If condition persists, contact your IBM 
service representative. 


| VM users operating on volumes that are not attached to a userid should see 
| “Performing Media Maintenance in VM” on page 59. 


i 
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Acronyms and Abbreviations 


This list contains definitions for acronyms and 
abbreviations used in the various books in the Storage 
Subsystem Library. Some terms are more specifically 
defined in the glossary. 


CCH channel-check handler 

CCHH Cylinder, cylinder, head, head 
CCHHR Cylinder, cylinder, head, head, record 
CHL-I Channel interface 

CHPID Channel path identifier 

CKD Count-key-data 

CMS Conversational Monitor System 
CP Control program 

DASD Direct access storage device 
DD Data definition 

DFDSS Data Facility Data Set Services 
DLS Device level selection 

DLSE Device level selection enhanced 
DPS Dynamic path selection 

ERDS Error recording data set 


EREP Environmental Record Editing and Printing 
program 
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ESQA 
FBA 
FRU 
HA 
HDA 
ID 
IML 
v0 
JCL 
MDR 
OBR 
OS 
RO 
SIM 
SLH 
SMS 
SSsiD 
TSO 
vToc 


Extended system queue area 
Fixed-block architecture 
Field replaceable unit 

Home address 

Head-disk assembly 
identifier 

Initial microcode load 
input/output 

Job control language 
Miscellaneous data record 
Outboard recorder 

Operating system 

Record zero 

Service information message 
Subchannel-logout record 
Storage Management Subsystem 
Subsystem identifier 

Time sharing option 


Volume table of contents 


115 


116 Maintaining IBM Storage Subsystem Media 


Glossary 


This glossary contains disk storage subsystem terms 
used in the various books of the Storage Subsystem 
Library (SSL). 


Each of the terms included here is not necessarily used 
in this specific book. If you do not find the term you are 
looking for, refer to the index or to the Dictionary of 
Computing, SC20-1699. 


A 


A-unit. The direct access storage unit that contains the 
controller functions to attach to the storage control. An 
A-unit controls the B-units that are attached to it and is 

often referred to as a head of string. 


access mechanism. See actuator. 


actuator. A set of access arms and their attached 
read/write heads, which move as an independent 
component within a head-disk assembly (HDA). See 
also device and volume. 


alternate track. On a direct access storage device, a 
track designated to contain data in place of a defective 
primary track. 


B-unit. A direct access storage unit that attaches to 
the subsystem through an A-unit. 


C 


C-unit. A direct channel attach 3380 direct access 
storage unit that contains both the storage control 
functions and the DASD controller functions. A 3380 
C-unit (83380 Model CJ2) functions as a head of string 
and controls the B-units that are attached to it. 


cache. A random access electronic storage in 
selected storage controls used to retain frequently 
used data for faster access by the channel. For 
example, the 3990 Model 3 contains cache. 


channel interface (CHL-I). The circuitry of a storage 
control that attaches storage paths to a host channel. 


cluster. See storage cluster. 
concurrent media maintenance. The capability that 


enables a customer to perform maintenance on a track 
while allowing user access to that data. 
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control unit. A hardware unit that controls the reading, 
writing, or displaying of data at one or more 
input/output devices. See also storage control. 


controller. The hardware component of a DASD head 
of string unit that provides the path control and data 
transfer functions. For example, 3390 A-units have four 
controllers, and there are two controllers in a 3380 
Model AE4, AK4, or CJ2. See also device adapter. 


count-key-data (CKD). A DASD data recording format 
employing self-defining record formats in which each 
record is represented by a count area that identifies the 
record and specifies its format, an optional key area 
that may be used to identify the data area contents, and 
a data area that contains the user data for the record. 
CKD is also used to refer to a set of channel commands 
that are accepted by a device that employs the CKD 
recording format. See also extended count-key-data. 


D 


DASD. Direct access storage device. 


DASD subsystem. A storage control and its attached 
direct access storage devices. 


device. A uniquely addressable part of a DASD unit 
that consists of a set of access arms, the associated 

disk surfaces, and the electronic circuitry required to 
locate, read, and write data. See also volume. 


device adapter (DA). The hardware component of a 
3390 head of string unit that provides the path control 
and data transfer functions. See also controller. 


device ID. An 8-bit identifier that uniquely identifies a 
physical I/O device. 


device level selection (DLS). A DASD function 
available with 3380 Models AD4, BD4, AE4, BE4, AJ4, 
BJ4, AK4, BK4, and CJ2. With DLS, each of the two 
controllers in the DASD string has a path to all devices 
in the string, and any two devices in the 2-path DASD 
string can read or write data simultaneously. See DLS 
mode. 


device level selection enhanced (DLSE). A DASD 
function providing four data transfer paths to each 
device in a 4-path DASD string. With DLSE, any four 
devices in a 4-path DASD string can read or write data 
simultaneously. See DLSE mode. 


Device Support Facilities program (ICKDSF). A 


program used to initialize DASD at installation and 
provide media maintenance. 


17 


director. See storage director. 


dual copy. A high availability function made possible 
by nonvolatile storage in a 3990 Model 3. Dual copy 
maintains two functionally identical copies of 
designated DASD volumes in the logical 3990 Model 3 
subsystem, and automatically updates both copies 
every time a write operation is issued to the dual copy 
logical volume. 


duplex state. Two devices in a 3990 Model 3 
subsystem are in duplex state when they have been 
made into a dual copy logical volume. 


dynamic path selection (DPS). DASD subsystem 
functions available with all 3380 heads of string except 
Model A04. These functions include: 


¢ Two controllers providing data paths from the 3380 
strings to the storage directors 

e Simultaneous transfer of data over two paths to two 
devices, providing the two devices are on separate 
internal paths within the string 

e Sharing DASD volumes by using System-Related 
Reserve and Release 

e Providing dynamic path reconnect to the first 
available path. 


E 


Environmental Record Editing and Printing (EREP) 
program. The program that formats and prepares 
reports from the data contained in the error recording 
data set (ERDS). 


error recording data set (ERDS). The area in which 
error records are logged. ERDS information is stored 
in SYS1.LOGREC by MVS, in SYSREC by VSE, and in 
the error recording area by VM. 


extended count-key-data. A set of channel commands 
that use the CKD track format. Extended 
count-key-data uses the Define Extent and Locate 
Record commands to describe the nature and scope of 
a data transfer operation to the storage control to 
optimize the data transfer operation. The 3990 Storage 
Control supports the extended count-key-data 
commands. 


F 


fence. To separate one or more paths or elements 
from the remainder of the logical DASD subsystem. 
The separation is by logical boundaries rather than 
power boundaries. This separation allows isolation of 
failing components so that they do not affect normal 
operations. 
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H 


head-disk assembly (HDA). A field replaceable unit in 
a direct access storage device containing the disks and 
actuators. 


head of string. The unit in a DASD string that contains 
controller functions. Also called the A-unit. See also 
device adapter. 


home address (HA). The first field on a CKD track that 
identifies the track and defines its operational status. 
The home address is written after the index point on 
each track. 


ICKDSF. See Device Support Facilities program. 


IDCAMS. A component of Data Facility Product that is 
also referred to as access method services. 


identifier (ID). A sequence of bits or characters that 
identifies a program, device, controller or system. 


index point. The reference point on a disk surface that 
determines the start of a track. 


initial microcode load (IML). The act of loading 
microcode. 


media. The disk surface on which data is stored. 


media SIM. A message generated when 3390 detects 
a device media fault that requires media maintenance. 
See also service information message (SIM). 


nonsynchronous operation. A type of operation in 
which the channel and storage control activities 
required to end one command and initiate the next do 
not necessarily occur within the inter-record gap 
between two adjacent fields. With nonsynchronous 
operations, the channel can be slower than the device 
on reads, and faster than the device on writes. The 
time difference in processing a channel program will 
be as a result of the current operating environment 
rather than on a property of the device or storage 
control. Contrast with synchronous operation. 


P 


physical ID. A unique designation to identify specific 
components in a data processing complex. 


primary track. On a direct access storage device, the 
original track on which data is stored. See also 
alternate track. 


R 


release. A facility that allows other host systems to 
communicate with the reserved device. Contrast with 
reserve. 


resume. A function on a 3990 Model 2 or 3 Storage 
Control in DLSE mode, configured with only 4-path 
strings. This function enables a component that has 
been quiesced. This function is initiated by a service 
representative. Contrast with quiesce. 


S 


service information message (SIM). A message that 
appears on the operator console and in EREP reports, 
generated by a 3990, a 3380 Model CJ2, or a 3390, that 
contains notification of a need for repair or customer 
action. The SIM identifies the affected area of the 
storage control or device and the effect of the service 
action. See also media SIM. 


SIM Alert. An operator console message that alerts 
the operator that an action requiring attention has 
occurred. The service information message (SIM) can 
be obtained from the EREP exception report. 


simplex state. A volume is in the simplex state if it is 
not part of a dual copy logical volume. Terminating a 
dual copy logical volume returns the two devices to the 
simplex state. In this case, there is no longer any 
capability for either automatic updates of the secondary 
device or for logging changes, as would be the case in 
suspended duplex state. 


storage cluster. In the 3990 Storage Control and 3380 
Model CJ2, a power and service region containing two 
independent transfer paths. See also storage director, 
single-path storage director, and multipath storage 
director. 


storage control. The component in a DASD subsystem 
that connects the DASD to the host channels. It 
performs channel commands and controls the DASD 
devices. For example, the 3990 Model 2 and Model 3 
are storage controls. 


storage director. In a3990 storage control, a logical 
entity consisting of one or more physical storage paths 


in the same storage cluster. In a 3880, a storage 
director is equivalent to a storage path. See also 
storage path, single-path storage director, and 
multipath storage director. 


storage management subsystem (SMS). An operating 
environment that helps automate and centralize the 
management of storage. To manage storage, SMS 
provides the storage administrator with control over 
data class, storage class, management class, storage 
group, and automatic class selection routine 
definitions. 


storage path. The hardware within the 3990 Storage 
Control that transfers data between the DASD and a 
channel. See also storage director. 


storage subsystem. A storage control and its attached 
storage devices. 


string. A series of connected DASD units sharing the 
same A-unit (or head of string). 


subsystem identifier (SSID). In a 3990 Storage Control 
configuration, a number that identifies the physical 
components of a logical DASD subsystem. This 
number is set by the service representative at the time 
cf installation, and is included in the vital product data 
in the support facility. This number is identified on the 
DASD A-units and 3990 operator panels. 


subsystem. See DASD subsystem or storage 
subsystem. 


subsystem storage. A term used for cache in a 3880 
Model 13 or 23. See cache. 


suspended duplex state. When only one of the devices 
in a dual copy logical volume is being updated because 
of either a permanent error condition or an authorized 
user command. All writes to the remaining functional 
device are logged. This allows for automatic 
resynchronization of both volumes when the dual copy 
logical volume is reset to the active duplex state. 


T 


track compatibility mode. See 3380 track compatibility 
mode 


U 


unit address. The last two hexadecimal digits of a DAS 
device address. This identifies the storage control and 
DAS string, controller, and device to the channel 
subsystem. Often used interchangeably with control 
unit address and device address in System/370 mode. 
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vital product data (VPD). Nonvolatile data that includes 
configuration data, machine serial number, engineering 
change level, and machine features. It is maintained 
by the 3990 support facility. It is stored in the 3990 
support facility and the 3390. 


volume. The DASD space accessible by a single 
actuator. 
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3380 track compatibility mode. A mode of operation in 


which a 3390 device manages its tracks as if they were . 


3380 tracks. Contrast with 3390 mode. 


3390 mode. The mode of the actuator when the entire 
capacity of the 3390 device is initialized. Contrast with 
3380 track compatibility mode. 


4-path string. A series of physically connected DASD 
units in which the head of string provides four data 
transfer paths that can operate simultaneously. A 3390 
4-path string requires one A-unit, while two 3380 Model 
AJ4/AK4 units are required for a 3380 4-path string. 
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Direct Channel Attach Model CJ2 Channel Attach Model CJ2 Introduction information for 3380 Model CJ2 
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Introduction 
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3990 Storage Control Manuals 
Cache Device Administration GC35-0101 
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commands necessary to manage 
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functions 


Detailed information on installation 
and use of the 3990 storage control 
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IBM 3990 Storage Control Reference 


Introduction to Nonsynchronous 
Storage Subsystems 


IBM 3880 Storage Control Models 1, | 
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raps ia hed 
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Access Storage Subsystems 
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Introduction to IBM 3880 Storage Introduction to IBM 3880 Storage GA32-0086 Overview of 3880 Model 23 functions 
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IBM 3370 Direct Access Storage 
Description 


IBM 3375 Direct Access Storage 
Description and User’s Guide 
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Customer and Service Information 
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Subsystem Customer Information 
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Access Storage 
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ICKDSF User’s Guide and Reference Device Support Facilities User’s Guide GC35-0033 Description of ICKDSF functions and 


and Reference 


commands for DASD initialization 
and maintenance 
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Migration Guide 
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GG24-3373 


Provides guidelines and detailed 
procedures for moving MVS and VM 
data to 3390 from other DASD 


Anson 
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Release 4 Guide Version 2, Release 4 commands 

NetView Customization: Writing NetView Customization: Writing $C31-6015 Shows step by step instructions for 

Command Lists Command Lists writing command lists (CLISTS). 

VM/SP HPO CP for System VM/SP HPO CP for System $C 19-6224 Discussion of system programming 

Programming (Release 5) Programming (Release 5) tasks and commands, including CP 
INDICATE, SYSOWN, MONITOR 

VM/XA SP Planning and VM/XA SP Planning and Administration GC23-0378 Discussion of VM/XA SP hardware 

Administration : and software planning, system 


design, and system definition 
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CKD (count-key data) devices (continued) 
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record format 74 
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concurrent media maintenance 
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automating 38 
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DASD subsystem exception report (continued) 
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data check 
description 15 
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permanent 17 
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source 18 
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data record 75 
data transfer 
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multiple 72 
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description 79 
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ANALYZE SCAN 54, 56, 60 
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commands for use with 3380 57 
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dual copy 59 
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handling error situation 25 
if-then-else processing 58 
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- Service Information Messages report 25 
recovery actions 53 
REVALIDATE command 69 
sample invocation EXEC 38 
using forDASD 37,56 — 
using to locating errors 55 
DEVTYPE field 
DASD data transfer summary 51 
diagnostic aid, EREP 25 
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direct channel attach 79 
DIRECTIO parameter 59 
disk 
description 78 
packaging 72 
Surface 72 
disk media, as failing unit 37, 54 
as failing unit 37 
DASD alert 37 
MEDIA ALERT 37 
disk storage subsystem 
component identification 77 
drive mechanism 
description 78 


dual copy 59 
ECC 17, 22 


emulation mode 90 
equipment check 
description 15 
permanent 17 
temporary 17 
ERDS (error recording data set) 
data captured 25 
EREP 24 
permanent errors recorded 50 
system exception 25 
EREP 
description 24 
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reports 43 
asynchronous notification record detail 25, 28, 
33 
DASD data transfer summary 50. 
DASD subsystem exception 48 
service information messages 28, 33 
system errorsummary 46 
system exception 25, 43 
system exception reports 
comparison table 46 
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use 25 
using to geta specific SIM 35 
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accumulated 48 
attributes 15 
automatic recovery 21 
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category 56 
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confirm occurrence 37, 54 
correcting 17, 22 
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DASD data transfer summary 50 
DASD subsystem exception report 48 
data and control, recovering 22 
detecting 21, 22 
determine source 37, 54 
service information messages report 37 
generating records 21 
handling 37 
basic steps 53 
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overview 15 
3330 83-85 
3340 86-87 
3344 88—89 
3350 90-93 
3370 94—95 
3375 96-101 
3380 96-101 
3390 102-107 
9332 108-109 
9335 110-113 
identifying 27, 43 
impact on data 20 
locating 37 
using DASD data transfer summary 55 
using Device Support Facilities 55 
using service information messages 37 
logging 24 
notification 21 
permanent 16, 17, 37, 46, 54, 56 
data 50 
messages atconsole 43 
system errorsummary 46 
3330 84, 85 
3340 86, 87 
3344 88, 89 
3350 92,93 
3370 94, 95 
3375 100, 101 
3380 100, 101 
9335 111 
perspective 16 
recovery 22, 37, 43, 56 
guidelines 21 | 
system and subsystem 21 
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reviewing SIM data 27 
SIM Alert 28 
console message 28 
using the SIM Alert 28 
SIM Alert console message 27 
SIM generation 22 
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specifying limits 48 
storage 43 
subsystem handling 22 
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recovery 55 
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3340 87 
3344 89 
3350 91,93 
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3375 99, 101 
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9335 110, 112 
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type 15, 44 
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system errorsummary 48 
error handling 81 
basics 81 
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guidelines 81 
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system errorsummary 47 
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generating asynchronous notification record detail 
report 25 7 
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guidelines 
error recovery 21, 81 
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hardware 

hardware failure 
DASD ALERT 27 

HDA (head-disk assembly) 
description 78 

HEAD field 
DASD data transfer summary 52 
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ANALYZE DRIVETEST 54, 59 
path control parameter 59 
ANALYZE SCAN 54, 56, 60 
automated capability 38 
commands for use with 3380 57 
concurrent media maintenance 36, 43, 53 
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handling error situation 25 
if-then-else processing 58 
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INSPECT command 62 
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invoking with PROP 41 
media maintenance actions, procedures 36, 37, 43 
media maintenance procedures 
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Service Information Messages report 25 
recovery actions 53 
REVALIDATE command 69 
sample invocation EXEC 38 
using for DASD 37, 56 
using to locating errors 55 
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identification 
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identifier, physical 78 
identifying errors 43 | 
identifying the need for service 27 
if-then-else processing 58 
impact of failure 27 
impact of repair 27 
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ontrack 74 
INIT command 
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using ICKDSF 57 
initializing a volume 69 
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and concurrent media maintenance 63 
PRESERVE parameter 64 
relevant parameters for rewriting data 65 
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INSPECT command (continued) 

relevant parameters for surface checking 64 

use in rewriting data 65 

use in surface checking 62 
INSTALL command 

how to use 68 

using to change modes on a 3390 68 
invocations of ANALYZE SCAN 61 
IODELAY command 

concurrent media maintenance 58. 
1/O address 

description 77 

identifier 77 


JOBNAME field 
system errorsummary 47 
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key area 75 
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limits, specifying 48 
logging subsystem 23 
logical record 73 


maximal initialization 
description 66 
media error 
MEDIA ALERT 27 
media maintenance 
automated capability 
using NetView 38 
VM PROP 41 
performing 37, 56 
performing on non-SIM DASD 27, 43 
performing on SIM DASD 27, 28 
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SIM severity reporting option 38 
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description 66 
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description 66 
mode 
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automated capability 38 
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subsystem exception report 49 
system errorsummary 47 
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physical characteristics 
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DASD 71 
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system errorsummary 4/7 
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subsystem exception report 49 
system errorsummary 48 
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example 41 
programming check, description 15 


R 


read/write head 71 
record 
blocked 73 
CKD devices 73, 74 
format 74 
logical 73 
physical 73 
selection 74 
data 75 
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track descriptor 75 
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recovery procedures 21 
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DASD subsystem exception 48-50 
obtaining 25 
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system errorsummary 46-48 
system exception 27, 46 
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system exception reports 25 
REVALIDATE command 
description 69 
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sector location 76 
sector read retry, 9335 113 
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DASD data transfer summary 52 
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sense information 
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definition 21 
overview 22. 
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DASD data transfer summary 52 
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service information message 
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Service Information Messages report 
example 33 
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how to invoke 64 
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time to perform 65 
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storage control 19 
as failing unit 18, 19, 54 
description 79 
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storage subsystem 
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errors 22 
physical components 78 
subsystem 
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subsystem exception DASD report 
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TOTALS field 
subsystem exception report 50 
track 
address 74, 76 
disk surface 72, 74 
format 
CKD devices 73, 74 
ECKD devices 74 
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